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PREFACE 


MENTAL and educational measurement, with its appli- 
cations, now plays sucha pervasive part in our educational 
System that it is essential that all practising and intend- 
ing teachers should know something about it—the 
principles on which it is based, the application of these 
principles, and the difficulties and dangers encountered. 
, On the other hand the usual treatment of the subject 
involves a good deal of mathematical calculation. for 
Which many teachers and students may have no special 
aptitude and in which they are not specially interested. 
, It is mainly for the latter that this book is written. It 
is not meant to train in statistical calculation. It is 
meant to provide a reasonably adequate understanding 
of a subject which now plays such a large part in the 
educational field. The text contains no mathematical cal- 
culation (except the occcasional multiplication of a 
number by itself or by another number [), but, for the 
benefit of any who may wish to study the mathematics 
to some extent, an appendix is added explaining and 
illustrating the basic statistical calculations involved. 


February 1954. C.A.R. 
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Chapter One 
MEASURING THE MIND 


LE is now the best part of a century since the first 
systematic, if rather stumbling, attempts were made 
im carry out what has come to be known popularly as 
the measurement of mind', and to put this process on 
à scientific basis. 

We need not be critical of the phrase ‘measurement of 
mind'. No doubt it lacks precise definition, but the 
process which it names is certainly an attempt at measure- 
ment, and measurement directed at such qualities as 
intelligence, temperament, character, and educational 
attainment which, though themselves lacking precise 
definition, are commonly described by the term ‘mental’. 

Measurement involves such things as scales and units 
and therefore requires, in its application, treatment by 
mathematical methods, and especially statistical methods. 

This effort to put at least one part of the field of 
psychology on a sound scientific basis has throughout 
its history been the subject of much controversy and 
often of violent opposition. It seems to arouse in many 
people deep feelings of repugnance, and even, sometimes, 
of fear and anger. Consequently many of the criticisms 
aimed at it are irrational, but they should not be lightly 
disregarded on that account, for they are by no means 
unnatural and they might well have unfortunate conse- 
quences for the progress of human thought and welfare. 
They arise ultimately, perhaps, from а slseply rooted 
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and sometimes unconscious apprehension lest the intru- 
sion of mathematics into the realm of mind may fetter 
the freedom of the spirit of man, and lead to a rigidly 
deterministic view of human constitution and behaviour. 

I believe this apprehension to be unfounded for I 
think it arises from a basic misunderstanding. My 
reasons for this belief will appearinthe sequel, but mean- 
time I would suggest that it is important that all who are 
interested in the development of a scientific psychology, 
and in the application of this for the benefit of mankind, 
should take the fears of their fellow men and women 
seriously and do all in their power to allay them. 

Criticism of mental measurement takes two forms. 
The first line of attack is directed at the whole principle 
of such measurement. It is said that the individual 
human mind, essentially tree and creative, with its end- 
less subtleties and ramifications and its host of 'impon- 
derables’, is in no wise susceptible of measurement analo- 
gous to that employed so fruitfully in the material 
world. 

Criticism of this kind generally has as its source the 
fear and repugnance to which I have already drawn 
attention. The answer to it can only consist in describing 
the purpose of mental measurement, the nature of the 
methods employed, and, especially, the kind of results so 
far achieved. 

With all these points we shall be concerned, but there 
is one fact of prime importance which should be men- 
tioned now. Although we speak of the measurement of 

mind’, we never actually measure directly what is 
commonly understood by ‘the mind’, namely mind in 
the subjective sense. Indeed we cannot, for measure- 
ment involves observation, and we cannot observe other 
people's minds. We can only observe their behaviour, 
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and all our measurement is in fact based on observation 
of behaviour. When, for example, a child works a group 
intelligence test, all that we can observe is a piece of 
behaviour, namely the marks which the child makes in 
the test booklet. If we apply mathematical methods to 
these marks and base predictions on our results, what 
We are predicting, or trying to predict, is not something 
going onin the child's mind, but how he is likely to behave 
in certain other situations such as those involved in 
following a particular course of education or a particular 
vocation. Any conclusion as to what exactly is going on 
in the child's mind must be an uncertain inference based 
on analogy with ourselves. 

The fact that direct measurement can be based only on 
Observation of behaviour need not, however, disturb us 
$0 long as our assumption that other people's behaviour 
and inner mental life are related in the same way as our 
Own does not in practice lead to alarming or chaotic 
Consequences. Moreover many of the purposes for 
Which mental measurement is used are concerned more 
directly with the individual as a member of a social 
community or of some kind of group, and in this case 
his actual behaviour, taken over a considerable period of 
time, is a good deal more important than what exactly 
18 going on in his mind. ^ 

The second line of attack on mental measurement takes 
the form of criticism of the methods employed. It has 
Often been used to bolster up the first line. The instru- 
Tents of measurement, especially objective tests, are 
Said to have serious defects and,are accused of failing to 
Measure what they are supposed to measure. — 

In principle this approach to a critical survey is sound, 

or any technique, and especially a comparatively new 
one, will have its weaknesses and will be open to contin- 
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uous improvement. But in practice the criticisms put 
forward are often of the wrong kind and many of them 
are beside the point. For example some tests have been 
criticized because they did not assess something they 
were never meant to assess. Thus at one time it was 
quite common to hear intelligence tests criticized because 
they did not assess qualities of character. This is like 
criticizing a clinical thermometer because it does not 
record the patient's weight as well as his body temperature. 
It would have point only if intelligence tests were used 
alone, and regarded as sufficient by themselves, in situa- 
tions where it was important to assess character as well 
as intelligence. But no reputable psychologist would use 
intelligence tests in this way, and, if ignorant enthusiasts 
have ever done so, one can only say that an instrument 
should not be condemned because it is sometimes misused 
by an ignoramus. 

Again, to take another example from intelligence tests, 
these have often been said to be ‘merely superficial’, 
testing a certain facile quickness in the uptake—a sort 
of ‘Smart Alec’ quality—but failing to strike deep. 
Criticisms of this kind are commonly based simply on 
an inspection of the tests. But to judge an instrument 
just by looking at it is stupid. The important question 
ds ‘How well does it do its job?’, and this can be 
answered only by experience in practice. 

The fact is that psychologists are well aware of weak- 
nesses in their instruments and are continually seeking 
to eliminate them. It must however be admittted that 
some of the irrelevant criticisms we so often hear might 
never have been made if, from the beginning, the new 
Science had been presented not so much in terms of the 
measurement of mental qualities—often vaguely con- 
ceived and defined—by certain instruments, as in terms 
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of the actual jobs which those instruments were meant to 
perform. But it is easy to be wise after the event, and it 
was no doubt inevitable that in this new field we should 
begin by speaking in terms made familiar by their use in 
the material sciences. 

It is of course true that objective tests and other instru- 
ments of psychological investigation can significantly 
be said to measure something, and what this something 
is is of great interest and importance for theory, and the 
inquiry into it may well have a valuable influence on 
practice. But the immediate criteria by which our 
instruments should be judged are what exactly are they 
meant to do, is it something worth doing, and how well 
do they do it. 

One might perhaps summarize by saying that there is 
no reason to doubt the validity in principle of what may 
not inappropriately be called ‘mental measurement’; 
that this does not in itself imply anything about freedom 
and determinism although it does open a possible avenue 
to the better understanding, and therefore the better 
treatment, of human beings; and that criticism of the 
instruments of measurement, and the criteria by which 
these are judged, are more likely to be relevant and 
helpful when expressed, not in terms of what the instru- 
ments are supposed to measure, but in terms of the actual 
practical jobs they are meant to do. ° 


Chapter Two 
METHODS OF MEASUREMENT 


T fundamental method of measurement is by 
direct comparison of like with like. This is seen 
in its purest form only in spatial measurement. We 
measure the length of a material body by laying along- 
side it another material body—a ‘ruler’ on which marks 
have been made to form a graduated scale. Conse- 
quently the great majority of methods of measurement 
in the physical field reduce to measurements of a length. 
Thus we measure temperature by the length of a column 
of mercury, and the strength of an electric current by 
the distance moved by a pointer over a scale attached 
to a special instrument. 

There are, of course, some exceptions to this. We 
measure time usuall 
scales—the hands o 
do, measure it b 
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temperature and electric current, the matter is consider- 
ably more complex and, as already pointed out, we have 
to fall back on indirect measurement of effects which 
enable us to reduce the measurement to that of a length 
or distance. Thus one of the effects of change of tem- 
perature is change in the length of material objects 
affected by it. So we commonly measure change of 
temperature indirectly by this effect. 

When we pass from the physical to the mental field 
we are in still greater difficulties, for not only is direct 
comparison now out of the question, but indirect meas- 
urement through direct comparison of such physical 
things as lengths is equally impossible. A new kind of 
technique is required. This technique will be considered 
in the next section; but here some account must be 
given of the instruments which produce the results to 
which the technique is applied. 

Probably the most familiar of these instruments is the 
objective test. This generally, though not always, con- 
sists of a large number of short questions to be answered 
by underlining or ticking certain words or other symbols, 
or, sometimes, by writing single words or short sen- 
tences. But the important thing about the test is, not the 
details, but the principles of its construction, which, 
together with strict and simple rules of marking are such 
as to ensure that the result of the test is independent of 
the marker. Scoring is mechanical so that, apart from 
sheer carelessness, the same total of marks will be 
awarded no matter who scores the test. That is why the 
test is calted ‘objective’. e 

One of the best known types of objective test is the 
‘intelligence test’, which is designed to test natural 
ability as distinct from acquired knowledge, so far as 
this is possible, Here, for example, is an item from a 
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‘classification’ test, in which a number of words are 


given one of which does not fit with the others and has 
to be picked out. 


sparrow lark thrush lion eagle 


Evidently ‘lion’ is the answer, and has to be underlined. 
Here is an example from an ‘analogies’ test: 


cat, kitten—(mouse dog milk lap puppy) 


in which the two words in the bracket related in the 
same way as the two words outside have to be underlined. 
The answer is, of course, ‘dog’, ‘puppy’. 

Again a ‘number series’ test: 


235812... 


in which the two numbers which come next have to be 
given. They are 17 and 23. 

One way of securing objectivity by simplifying pro- 
cedure is to limit the number of possible responses to a 
question as in the first two examples given above. Thus 
it is common for a test item to require a choice to be 


made from a limited number of suggested answers. This 
clearly reduces considerabl 


the test; and, indeed, 
tests (such as those use 
individual, 
response, b 


of response was attempted, and det. 


2 types of response were 
Classified, there were cases, by no means rare, in which 
the tester had to exercise 


е h his own judgínent as to the 
class in which the response should Properly be placed. 
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Accordingly, in the modern developments of group test- 
ing where, partly owing to the numbers involved, simpli- 
city and objectivity of scoring are prime considerations, 
it is usual to find a limited choice of response in the 
test items. This has given rise to some criticism of the 
tests, but here, as always, it must be remembered again 
that a test should be judged, not by the details of its 
construction, but by how well it does the job it is meant 
to do. 

Attempts have been made, however, to devise methods 
of objective or partially objective scoring of such things 
as English composition, in order to meet criticisms of the 
apparent limitations of the short answer tests. Scales 
have been drawn up consisting of specimens placed in 
Order of merit by a group of experienced judges of 
English composition, or whatever the subject may be. 
The tester has then to assess a piece of work by com- 
paring it with the scale and scoring it according to the 
Specimen which he thinks it most nearly approaches in 
merit. But in view of the marked intrusion of the sub- 
Jective element it is not surprising that this method is 
not very convincing and has not yet proved itself suffi- 
ciently in practice. 

There are perhaps more fruitful possibilities for 
methods of measuring abilities in such fields as written 
composition indirectly, by seeking to discover objective 
Short-answer tests which show high correlation with these 
abilities, Tt is true that, in doing this, we have to intro- 
duce the subjective element of pooled judgments by 
Experts in comparing specimeps of (say) compositions 
with the results of objective tests, but this is not nearly So 
Serious as the subjective element introduced by requiring . 
each person who uses the tests to decide where a given 
Plece of work should find its place in the scale of speci- 
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mens. Indeed a subjective element such as the pooled 
judgments of experts is in the end natural and inevitable, 
for what we are always actually concerned with in such 
cases is not the merits of a piece of work as judged by some 
absolute independent scale, but as judged by the opinions 
of those most experienced and expert in the field. That is 
the only criterion which has any ultimate significance. 

Assessment by the pooled judgments of members of a 
panel of experts is in fact a recognized method of 
measurement of those mental powers or qualities for 
which it has not so far been possible to devise suitable 
objective tests of the usual type. The kind of technique 
employed will be considered in the next section. A dis- 
advantage of the method is its limited scope. It can be 
used on a large scale only with a considerable number 
of panels, and then the further problem arises of co- 
ordinating the standards of judgment of the different 
panels. But in particular cases, and where relatively 
small numbers are concerned, it is a useful and valuable 
method. 

Another method used in some kinds of mental 
measurement is that of the questionnaire. Those who 
are to be measured or to be made, as a group, the basis 
of measurement, are required to answer a list of care- 
fully devised questions covering the field concerned. 

Р The answers may be classified in such a way as to 
provide a scale of comparison. The method is especially 
appropriate in the measurement of such things as 
interests and. attitudes of mind, Evidently honesty of 
response is an important factor, but it is usually possible 
tó keep some check on this and such evidence as there 
18 seems to show that, in practice, any dishonesty there 


tay be is generally not enough to upset the results 
seriously. 
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Most methods of mental measurement reduce to one 
of the three described, the objective test proper, the 
pooled judgment, and the questionnaire, or to a com- 
bination of these. The chief problem is to devise asso- 
ciated techniques which will do what is required. 
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Chapter Three 


TECHNIQUE 


SCALES AND UNITS 


EASUREMENT requires a scale and a scale requires 
M origin or zero point to measure from, and a 
unit in terms of which to measure. 

These requirements are easily met in the case of such 
physical characteristics as height and weight. In the 
case of height, for example, the zero point provides a 
natural origin of measurement, while marks at equal 
intervals of length on a ruler or its equivalent provide 
units of measurement. But there is no such obvious 
and more or less ready-made scale available in the case 
of mental measures. 

We can approach the problem of devising a scale for 
mental measurement by considering further the measure- 
ment of the physical characteristic of height. 

€ Zero point is not the only possible origin of 
measurement of height. Other origins may be devised 
Which are for some Purposes more significant. For 
instance suppose we were to measure the heights of all 
individuals in a certain group—say the heights of all 
adult males in England. We first express these heights 
in terms of feet and inches measured from the zero point. 
From these measures we could calculate the average 
height of the group. Let us suppose it to be 5 feet 
8 inches, that is 68 inches. We could' then take this 
average point on the scale as origin and speak of a 

12 
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man (say) 5 feet high as 8 inches below average, and of 
one 6 feet high as 4 inches above average. Clearly it is 
far more important for many purposes to know by how 
much—above or below—an individual deviates from 
the average than merely to know what his actual height 
is in feet and inches. 

We can if we wish go a step further and call the average 
Point of the scale zero, marking deviations above this 
and below this as respectively positive and negative 
with corresponding plus and minus signs. Thus in the 


“example given in the last paragraph we could take the 


68 inches mark as zero. Relative to this the 5 foot high 
man would have a ‘mark’ of —8 inches and the 6 foot 
high man a mark of --4 inches. 

Our unit, however, is still the inch, which is really a 
Conventional and arbitrary unit agreed on as convenient 
for certain purposes. Is there any more significant unit? 
In order to answer this question it is necessary to give 
some consideration to the way in which human char- 
acteristics are derived. 

_ Granted that he is not stunted by illness, wrong feed- 
ing, or other unfavourable conditions, what fixes the 
height to which a man will grow? Why are some ер 
naturally? (as we say) taller than others? Evidently it 
must be due to some inborn, and therefore inherited, 
factor which differs in degree from one member of the 
human population to another. : 

The degree of such factor in any particular person 18 
Of course determined by his ancestry, qat 18 by. ш 
Characteristics of those pairs, spretching back indefinitely 
to the remotest past, whose matings have finally produced 

im. So far as he is concerned these matings will pue 
een in effect a matter of ‘chance’, that is they will have 
Cen produced by a host of causes mostly independent 
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of each other and largely incalculable. In other words 
the height to which a man grows naturally is determined 
by a very large number of independent causes which 
may be imagined as made up of unitary factors each of 
which is either present and totally operative or absent 
and quite inoperative—one might say that each has the 
value 1 or o but nothing between, and each is as likely 
to be present as absent. The same is true of all inherited 
human qualities, whether physical or mental. 


NORMAL DISTRIBUTION 


When a quality is distributed over the population as 
the result of a process such as that described, which 
determines the varying amounts, as it were, of the 
quality possessed by different individuals, the distri- 
bution is said to be ‘normal’, Owing to the way it has 
come about the normal distribution is often regarded as 
the typical ‘random’ or ‘chance’ distribution. It has 
important properties which we must now consider. 

Let us suppose that we can measure a certain quality 
in terms of units all of which are equal. We will return 
to the case of height as an example. Height is measured 
in inches, and one inch is the same as another inch. Let 
us then determine the number of individuals in the 
Population, or in a representative sample of it, lying in 
the various ranges of height each of one inch—that is 


m. It can be shown mathematic- 
ofa normally distributed quality 
results will group themselves in a 
ge of height just around the average 

cases, and the ranges up and down 
from the average or mean value, as we generally call it, 
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tail off symmetrically. The further an inch-range of height 
is from the mean, the fewer the number of individuals 
whose heights will fall within that range. Hence in a 
normal distribution there is a concentration or clustering 
about the mean and a symmetrical spread tailing off on 
either side of the mean. The continuous curve which 
represents this graphically is something like a cocked 
hat with rounded top. The highest point occurs at the 
mean about which there is the concentration of cases. 
See Figure т. 


Mean 
Fig. 1 

All this, which can be shown to follow mathematically 
from the way a normal distribution is produced, is con- 
firmed by actual observation and measurement in the 
Case of height and other physical characteristics which 
n be measured directly in terms of equal units. But 
Case is somewhat different when we are measuring 
Some mental characteristic such as ‘intelligence’, for 
ae by an objective test. Here the units are marks 
he test and, as they stand— raw" scores we call them 
a single marks cannot be regarded as exactly 
equivalent. Thus we cannot say that the mark for 
foe vering one particular item is equivalenj to the mark 
Аср сил another particula item, or that the differ- 
‚Бе between то and тт marks, say, 15 equivalent to the 
prse between go and 91 marks. . А 

oth nat we actually find in the case of intelligence an 
€r objective tests, and indeed, many qther kinds of 
А 
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examination, is that, if these are applied to a representa- 
tive sample of the whole population or of a whole age- 
group of the population, the results usually conform 
more or less closely to the curve of normal distribution, 
but do not usually fit that curve exactly. The tests must, 
of course, besoundly constructed withsuitable proportions 
of items of various levels of difficulty. The reason for 
the approximation to the normal curve is that the results 
of such tests and examinations are determined by a very 
large number of factors many of which are independent 
of one another. The failure to fit the curve perfectly is 
due to various causes. "There will be small errors in the 
operations involved in devising and applying the test, and 
the group to which it is applied may not always be a 
sufficiently representative sample. But undoubtedly one 
of the reasons for the departure from perfect fit is this 
matter of the inequality of units. 

Raw marks, or measures—such, for example, as 
‘mental age’—based. directly on these, are quite ade- 
quate and suitable for Many purposes; but where pre- 
cision is required, and it is important to know the exact 
significance of our units, we use another method of 
approach. This is based on a measure of the spread of 
the normal curve, and we must now discuss this. 


BASIC STATISTICAL CONCEPTS 


I have spoken of the concentration of cases about the 
mean of the normal curve and of the contrary spread 


out and falling off on either side of the mean. One 


obvious way of assessing the overall amount of the spread 


is by finding the average of all the deviations from the 
mean of the individuals measured. This is called the 
‘mean deviation’—not to be confused with the mean 
itself. "There is an average of all the measures (the mean) 
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and an average of the deviations from this (the mean 
deviation). 

In calculating the mean deviation we ignore the sign 
of the deviations, for, as the normal curve is symmetrical 
about its mean, there is for each plus deviation (above 
the mean) a corresponding minus deviation (below the 
mean), so that if we added all the deviations together 
algebraically, that is taking account of plus and minus 
signs, the result would simply be zero. Therefore we 
ignore signs. 

For measuring spread, however, we generally use, 
not the mean deviation, but another measure called the 
“standard deviation’. Let us suppose that we have 
measured the deviation of one member of a group from 
the average or mean height (say 68 inches) of the whole 
group. If, for example, his height is 7o inches, his 
deviation is 2 inches above the mean, that is 4-2 inches. 
Let us square this—that is multiply it by itself. We then 
have (+2) x (4-2) = +4 as his ‘square deviation’. 
If his height had been (say) 65 inches his deviation would 
be 3 inches below the mean of 68 inches, that is —3 
inches. His square deviation would then have been (—3) 
x (—3) = +9 inches. Notice that the square deviation 
is always plus, because when we multiply minus by 
minus we get plus. This is a great advantage, and is no 
doubt one of the reasons for using this measure ot 
spread. 

When we have found the square deviations for all the 
members of the group we add them all together and 
divide the sum by the number of members in the group 
to get the average or *mean square deviation’. Finally 
we calculate the square root of this to get the ‘root mean 
square’ or R.M.S. deviation. This is commonly called 
the ‘standard deviation’. 5 
с LI 
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To sum up: 


In order to find the standard deviation of a group 
1. Find the deviation of each member of the group. 
. Square this. 


2 

3. Add all the square deviations together. 

4. Divide by the number of members of the group to get 
the mean square deviation. 

5. Find the square root of this to get the root mean square 
deviation. 


The R.M.S. deviation is the standard deviation. It is 
commonly denoted by the Greek letter sigma (0). 
In a normal distribution the mean deviation is rather 
less than four-fifths (actually 7979) of the standard 
deviation. The mean square deviation (i.e., before the 
Square root is taken) is often called the ‘ variance ^, and is 
à very important quantity as it indicates the degree to 
Which the quality which is being measured varies about 
the mean within the group. | 

The variance, unlike the standard deviation of which 
it is the square, has the important property of being 
what is called ‘additive’. This means that, if the varia- 
tion in a quality arises from a number of different sources, 
the total variance of the quality is equal to the simple 
sum of the variances of the quality arising from the 
different sources of variation. This makes the variance 
а convenient quantity to work with when we are consider- 
ing the relationship between different qualities. 


We can now express the deviation of any member 
from the mean of the whole group as so 
standard deviation 


the mean. Thus s 


many times the 
, either above (plus) or below (minus) 
for hei uppose in the group we are measuring 
or height, the mean height of the group being 68 inches, 
we find the standard deviation (c), to be 3inches. If the 
height of one member of the group is 74 inches he is 
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б inches, or twice the standard deviation, above the mean, 
a deviation, that is, of +2c. If the height of another 
member were 59 inches, that is 9 inches or 3 times the 
standard deviation below the mean, his deviation would 
be —3c. 

We can easily represent on the diagram of the normal 
curve the distance corresponding to the standard devia- 
tion as in Fig. 2. It will be seen that the line of length 
equal to the standard deviation (S.D.) or с meets the 
curve at a point where the curve stops bending one way 
and begins to bend the other. The mathematical term 
for such a point is ‘point of inflexion’. 


Mean 


Fig.2 


The actual calculation of statistical quantities such as 
the mean, standard deviation, etc., and others to be 
considered later, is illustrated in Appendix I. 

In the case considered we have been working with 
inch units all of which can be regarded as equivalent. 
But if we are dealing with a mental test we are again up 
against the difficulty of equivalence of units. ‘Thus 
suppose that, in terms of raw marks, the mean score ofa 
group is бо and the standard deviation то, еп if one 
individual has a score of 8o we could only say he was 
twice the S.D. above average if each mark could be 
regarded as equiyalent to every other mark, and the raw 
marks themselves were normally distributed. To avoid 
this difficulty we can use another method of approach. 
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STANDARD MEASURE AND WEIGHTING 


This method depends on the fact that one of the most 
significant things about a particular measure or mark is 
the point at which it divides the group which is being 
measured or tested. This point is indicated by the per- 
centage of the group who fall below the mark. This 
percentage is called the ‘percentile’ corresponding to the 
mark. Thus suppose in an intelligence or attainment 
test 65 per cent. of a representative group fall below 
a mark of 8o. This mark 8o is said to be the “65 per- 
centile’. It is of course reached or exceeded by the 
: remaining 35 per cent. of the group. See Fig. 3. 


С 


Меап 80 
Fig. 3 


What is called the “percentile rank’ (P.R.) of the 
group-member who scores 80 is taken to be the point 
midway between the percentile corresponding to a score 
of 8o and the percentile corrresponding to the score made 
by the member who comes next above. Suppose the 


percentile corresponding to this latter score is 69, then 
the P.R. of the member with a score of 80 is midway 
between 65 and 6 


x 9, that is 67. Similarly for any other 
score. The P.R. of the тутини of E dup is 
taken as midway between the percentile corresponding 
р his Score, and roo, It is important to remember 
с е ооа between percentiles and percentile ranks. 
pee Appendix I. A percentile rank is, in effect, an esti- 


. 
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mate of the ‘true’ percentile position of the group- 
member concerned, made necessary by the fact that the 
distribution of marks proceeds not continuously but by 
jumps from each member to the next above. 

The 25, 50 and 75 percentiles have special names, 
namely ‘lower quartile’, ‘median’ and ‘upper quartile’ 
respectively. The difference between the marks cor- 
responding to the lower and upper quartiles is called 
the ‘interquartile range’, and this difference or one-half 
of it (the ‘semi-interquartile range’) is sometimes taken 
as a rough measure of spread. When there is an even 
number of members in the group, however, the ‘median 
mark? is taken as the mark half-way between the marks 
of the two middle members. 

The first step is, then, to scrutinize the results obtained 


» 
Q 
B 


The second step depends on the fact that when a group $ 
is being measured for a quality which is normally dis- ê 
tributed among them, any particular score will cut off a * 
percentage of the group in the way already mentioned 
and, from the curve of normal distribution, it is possible 
to find the position of the point of cut-off above or below 
the mean, and to express its distance from the mean in 
terms of the standard deviation of the normal curvé 
There is no direct and simple way of making this cal- 
culation but tables have been drawn up for converting 
percentiles to deviations expressed in terms of the S.D. 
of the normal curve. It is to these tables that we refer. 
They are to be found in full in many textbooks of 
statistics, and an abbreviated table is given in Appendix 
Il. Э) 

А percentile of just over 84 corresponds to a score of 


from measuring the group and hence to convert all the J © 
scores made by the members of the group to their $ : 
percentiles. : s 
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4-16, i.e., one S.D. above the mean; а percentile of just 
under 16 to —10, i.e., one S.D. below the mean. Simi- 
larly the percentile of (say) +-1:5с is 93:3, and of —26 
is 2:25 and so on. These c-units of the normal curve 
are regarded as equivalent. 

To summarize this process briefly: first convert all 
the scores of the group being measured to percentiles 
by finding at what point each score divides the group, 
and estimate the percentile rank of each member of the 
group. Then convert these P.R.'s to positive or nega- 
tive c-units from the tables for the normal curve. 

These expressions in c-units, with + and — signs, are 
unfamiliar and a little awkward in some ways. For many 
purposes, therefore, we use a method which yields more 
familiar looking marks which are easier to handle. 

We first choose a convenient number—roo is the 
usual choice—to represent the mean instead of zero. 
Then we choose another convenient number to repre- 
sent the S.D. For many Purposes we take 15 for this. 
This kind of choice may seem arbitrary, but it is analo- 
gous to choosing whether we should measure length, 
say, for a certain purpose in inches, feet or miles. It 
will depend on the kind of length we are measuring and 
the purpose of our measurement, 


Suppose we have decided to take 100 as the mean and 
15 as the S.D. Then 


at importance, not only for 
n connection with tests and 
те considering some examples of this, 
reader that it follows from what has 
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been said about it that the method of standard scores is 
strictly applicable only when we are justified in assuming 
that the quality measured is normally distributed, or at 
any rate very nearly so, in the group which is being 
tested, for only then can we use the conversion table 
from percentiles to c-units. As we have seen we can 
regard this condition as sufficiently satisfied when the 
individual measurements or scores are determined by a 
large number of factors mainly independent of one 


another. 


70 Mean 115 
(PR. 2°25) 100 (P.R.8415) 
Fig.4 


One valuable use of standard scores is to enable us to 
relate results in different examination subjects, or in 
different age-groups, and to make allowances or weight 
results accordingly. The basic postulate from which we 
start here is that ‘equivalent’ marks are those which 
divide the group at the same point, that is those having 
the same percentile value. This might perhaps bs 
regarded as a definition, but it certainly seems the most 
reasonable criterion of equivalence. To take an example, 
suppose an Arithmetic and an English paper have been 
worked. ‘What mark in the English paper Will be equiva- 
lent to a mark of (say) 60, in the Arithmetic paper? A 
scrutiny of the Arithmetic results shows, we will sup- 
pose, that the percentile of a mark of 60 for Arithmetic 
is 71, i.e., 71 per cent. of the group tested fall below this 


e 
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mark. Scrutiny of the English results shows that 71 is 
the percentile for a mark of 53. Then an Arithmetic 
mark of 6o is equivalent to an English mark of 53, Le. 
the two marks fix the same point in the order in which 
the members of the group are placed by the two 
tests. 

Obviously the next step is to indicate equivalent marks 
by the same number. In the above example the ‘raw’ 
marks, though equivalent, are different, namely 60 and 
53- We convert them to the same number by the use of 
s-units and standard scores. The normal curve table 
shows that a percentile of 71 corresponds to 4-0:556. 
Then suppose we decide to standardize the two tests on 
а mean of тоо and a S.D. of I5. An Arithmetic raw score 
of 60 and an English raw score of 53 will then each 
become a standard score of 100 + (55 х 15), that is 
168 to the nearest whole number—actually 108-25. 

In this way the marks in the two tests become directly 
comparable with one another. In the same way marks 
in the same exam, paper scored by different age-groups 
may be made directly comparable. The raw marks for 
each separate age-group are converted to percentiles, 


and then to c-units, and finally to standard scores based 


f making age-allowances, 
» in the case of allocation at 11 
Secondary education. I would 


tive samples, prefera 
example quoted it would be for еас 
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different fields may be illustrated in connection with 
physical characteristics. Suppose a certain boy of 12 
(say) is one inch above average in height and то Ib. above 
average in weight. How can we relate these two meas- 
ures? Is the boy as much above average in weight as 
he is in height? It may be important to know this, but 
we cannot compare inches and pounds directly. If, 
however, we convert both height and weight measures 
to c-units the comparison can be made at once. 

More important from the point of view of mental 
measurement is the comparison of such things as intelli- 
gence and educational attainment. We often wish to 
know whether a child's educational progress matches 
his intelligence or whether he is failing to make the best 
of his powers. "The method of standard scores enables 
his performances in intelligence and attainment tests 
to be directly compared. 

Another most effective use of standard scores is in 
connection with the weighting of tests or examination 
papers. If an examination consists of papers in a number 
of subjects, the ‘weight’ of a particular paper or subject 
is its proportional influence in determining the result 
of the whole examination. 

It used to be thought that the weight of a paper could 
be increased by increasing the maximum marks allotted 
to it. This is a complete fallacy, as a simple example will 
show. Suppose two candidates A and B take an Arith- 
Metic paper with a maximum of тоо, and an English 
Paper with a maximum of 200. A scores 50 in Arith- 
metic and 160 in English, В scores 7o in Arithmetic 
and rso in English. It is clear that in determining the 
order of the students—B with 220 marks above A with 
210 marks—the Arithmetic paper has greater weight 
than the English paper, although there, is a higher 
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maximum for English, because the difference of marks 
between A and B in Arithmetic is greater than in 
English. Hence the weight of an examination paper, or 
any other form of measurement, depends not on the 
maximum possible score or on the mean score achieved 
by those to whom it is applied, but on the degree to 
which the latter are differentiated or spread out by it. 
Hence we take the standard deviation of the scores as a 
convenient index of weight, as it measures spread. s 

If, therefore, we wish to give two papers in an exami- 
nation equal weight we standardize the raw scores in 
each of them on the basis of the same mean (say 100) 
and the same S.D. (say 15). If on the other hand we 
wish to give one of the papers double the weight of the 
other we standardize it to a mean of 100 and a S.D. of 
30; or alternatively, and more simply, we just double all 
the previous standard scores in it, for although this will 
double the mean, making it 200, as well as doubling the 
S.D., the change in the mean will not affect the weight, 
as we have seen. 

There is a well-known method of expressing the 
results of mental measurement which is older than the 
standard score method, namely by means of mental 
ages and intelligence or educational quotients of various 
kinds. This procedure is now very familar. The score 
‘nade by a particular individual is compared with the 
average scores of different age-groups. His mental age 
Д5) the age of the group whose average score is equal to 
his own score. His quotient is his mental age divided 
by his actuaí age, and is usually expressed as'a percent- 
age. Thus we may have mental ages for intelligence, 
reading, arithmetic, English and so on with the cor- 


responding intelligence quotients (I.Q.s); reading quoti- 
ents etc, на 
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It is in no way suggested that the mental age method 
should be entirely dropped in favour of standard scores. 
Each technique has its advantages. The main differ- 
ence between them is that, generally speaking, with 
standard scores we are comparing a child with others 
of his own age-group and finding where he stands in 
that group, whereas with mental ages we are comparing 
him with children of various ages. 

The great advantages of the mental age method are 
that the principle on which it is based is easy to under- 
stand, it is simple to apply, and for many purposes it 
tells us all we want to know. Perhaps its chief dis- 
advantage is that different tests, in the same and in 
different fields, generally spread mental ages and quoti- 
ents differently. This is one of the reasons, but only 
one, why we often get different І.О.ѕ for the same 
child from two different intelligence tests. "This is not 
in itself a reflection on the tests. If the standard devia- 
tion of mental ages, and therefore of I.Q.s also, is 
greater on test A than on test В, all the I.Q.s above 
average will tend to be higher on A than on B, and all 
the LQ.s below average will tend to be lower on A 
than on В. A systematic difference of this kind, though 
it may mislead us if we try to compare the results of the 
tests directly, need not disturb us for it simply arises from 
factors in the construction of the tests. It is only wheü 
there are erratic differences between the results of two 
tests that inquiry is indicated into the method of con- 
struction and standardization of the tests, the conditions 
of their application, and the circumstances of the indi- 
viduals to whom they are applied. Ч 

The standard score method, оп the other hand, rules 
Out by its very technique all fortuitous differences of 
Spread. This is its great advantage. It enables us to 
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compare directly the results of tests in both the same and 
different fields. It is therefore particularly suitable for 
research and for examinations where it is decided to add 
together the results of different papers, either with the 
same or different weights, in the manner already 
described. But for use in many situations arising from 
time to time in the school and classroom, the method 


of mental ages and quotients is often the simplest and 
most convenient. 


SIGNIFICANCE AND STANDARD ERROR 


There is one application of standard measure which is 
particularly important in experiment and research as 
it enables us to check the significance of our results. 
To illustrate by a physical experiment let us suppose 
we wish to determine, say, the wave-length of sodium 
light. It is not enough to make one determination for, 
if we repeat the experiment, we shall usually get а 
slightly different result. This is because a number of 
errors arise, partly from the limits of accuracy of the 
apparatus used, and partly from inaccurate observation 
by the experimenter either in his use of the apparatus 
or through constitutional defects, e.g. of vision, in his 
own make-up. It may be remarked here, in passing, 
that this last source of error, namely that due to the 
particular experimenter himself, varies from one person 
to another and, in the case of any given person it is often 
possible to sum it up ina formula known as the ‘per- 
sonal equation’ of that, observer. which can then be 
used to ‘correct’ the latter’s observations. 

А То return to the main point. In view of the varia- 
tions in the result of observations to determine the wave- 
length of light, we proceed to make a large number of 
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determinations and then take the mean of the results 
as the nearest approach to the ‘true’ value. 

But this is not the end of the matter. In view of the 
fact that the variation in results is due to a large number 
of chance causes, these results will be distributed about 
their mean normally. The deviation of any one result 
from the mean is its ‘error’, and the standard deviation 
of the whole distribution is called the ‘standard error’ 
of the measurement we are making. The important 
point here is that, although the measures will cluster 
about the mean, they will also spread out from the 
mean, and a few may depart considerably from it. Sup- 
pose we find that one of the measures deviates from the 
mean by as much as 36, that is three times the standard 
deviation. Now we find from the tables of the normal 
curve giving the relation between deviations and per- 
centiles, that only about one case in a thousand deviates 
from the mean of the normal curve by as much as 30. 
Hence we can say that, if we find an error of measure- 
ment, that is a deviation from the mean result, which is 
as much as 36 there is only about 1 in a 1000 (actually 1:3 
in 1000) chance that it is merely due to random sources 
of error; it is very much more likely to be due to some 
Special cause connected with the experiment we are 
making. We can find similarly, for any degree of error, 
the chance that it is due to something purely accidental 
and not to some special cause. 

In experiments on physical phenomena the standard 
errors are usually small, as the conditions are 1n general 
easily controllable and the instruments of observation 
and measurement are highly accurate. The case 1s 
different with experiments in mental measurement where 
Our instruments are much less accurate and where it 1s 
often extremely difficult to eliminate or to reduce con- 
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siderably the influence of some of the disturbing factors. 
Standard errors therefore tend to be comparatively large, 
but they are nevertheless of great importance in deter- 
mining how much significance is to be attached to the 
result of an experiment. Indeed without using standard 
errors in this way we cannot place any reliance on the 
result of an educational experiment.1 

To illustrate the foregoing let us suppose that we 
wish to test the effectiveness of, say, a new method of 
teaching reading. The obvious thing to do is to try the 
new method on a group of children over a period of time 
and compare the progress made with that made by 
another, generally similar, group of children, during the 
same period, who continue to be taught by the old 
method. We call the first group the ‘experimental’ 
group and the second group the ‘control’ group. 

Both groups are given a test of reading ability at the 
beginning and again at the end of the period. If the 
experimental group show a greater average increase in 
Score in the test during the period than that shown by 
the control group we cannot conclude forthwith that the 
greater progress shown by the experimental group is 
due to the superiority of the new method of teaching. 


It might Possibly be due to a combination of chance 
поа We have to estimate the likelihood of 
this. 


rly customary to giy fan 
error-distribution Ty to give, as a measure of the spread o 


е > b- 
able error (Р.Е.) eei epi cana 

ble erro: ) h si e mea the error 
distribution suck that one-half of the danan Mec d cha limits 
Ooth above and below the mean. That 
interquartile range, or 'quartile 
or (S.E.) is the standard deviation 
uantities are obtained by definition 
ve, the relation between them is а 

"6745 (that is just over two-thirds) 
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This is done by calculating the standard error of the 
difference between the two increases in average score 
made by the two groups. The calculation can be made 
from the various data provided by the tests, such as the 
average or mean score made by each group at the begin- 
ning in the first test, and the mean score made by each 
group at the end in the second test. 

If the difference between the two increases in score is 
as much as three times its standard error, the odds 
against this difference being due to chance approach 1000 
to 1, so that it is fairly safe to assume that the greater 
Progress in reading ability made by the experimental 
group is really due to the superiority of the new method 
of teaching. 

It is common practice to assume provisionally that, 
When a result is at least three times its standard error, 
this result is due to the method with which we are 
experimenting. The result is then said to be ‘significant’. 

Hitherto I have been dealing with the technique of 
measurement of one quality or trait. We must go on to 
Consider the relation between the measures of two or 
more qualities, e.g. height and weight, or intelligence 
and attainment. But before doing so we will give some 
Consideration to methods of mental measurement or 
assessment other than by means of tests yielding 
Numerical scores. Such methods have already been 
mentioned and include judgments by experts or panels 
of experts classified into a limited number of categories. 


9 о 


CATEGORIES AND RÁTING SCALES 


If we are making a judgment in regard to some quality 
of a particular individual the simplest way of stating our 
Judgment is to say whether, in our opinion, that indivi- 
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dual is average, above average, or below average a 
regard to the degree in which he is characterized by the 
quality in question. In considering a number of indivi- 
duals this will give us a three-fold classification. If the 
group of individuals is such that the quality may | 
regarded as normally distributed among them we shou | 
expect to find considerably more placed in the average 
category than in either the ‘above’ or ‘below’ categories. 
Such a three-fold classification provides, however, 
only very rough estimates and it is more usual to take 
at least five categories—a s-point scale as it is sometimes 
called—which are commonly labelled A, B, C, D, E. 
Verbal descriptions applicable to these categories will 
vary with the nature of the quality which is being 


assessed, but a quite general description covering most 
qualities might be as follows. 


C = Average. 

B = Definitely above 
A = Quite outstanding. 
D= Definitely below avera 
E = Very weak. 


average but not outstanding. 


ge but not very weak. 


In rating particular traits we may often require more 
precise descriptions of our categories. For example, 


in rating what might be called ‘sociability’ the descrip- 
tions might run somewhat as follows: 


C= mixes naturally with others on appropriate occasions 
while Preserving an even balance between social and 
Personal life, - 

B= 


though not Without"personal resources seems definitely 


happiest when in the company of others. 
A = extremely lively and satisfied when.in the society of 


others but seems "lost" when alone and thrown on 
his own resources, 
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D = shares with some effort in social activities but evidently 
prefers his own company for the most part. 
E = extremely unsociable and isolationist. 


This is not, of course, the only way of categorizing 
a trait such as ‘sociability’, but it clearly implies that, 
for some traits, neither the A nor even perhaps the 
B category may be the most desirable, but rather the 
more evenly balanced C category. Consider, for example, 
a trait such as ‘Emotionality’ ranging from А = 
‘extremely sensitive and excitable’ to E = ‘extremely 
stolid and unresponsive’. Here it would probably be 
generally agreed that the C category of average emotion- 
ality and balanced response was in most circumstances 
the one to be desired. 

For our present discussion we can return to the 
simpler and more general category descriptions first 
mentioned and based on C = ‘average’. For some 
purposes the 5-point scale may be regarded as inade- 
quate, and it is then extended to perhaps an 11-point 
Scale, usually by adding plus and minus signs to the 
B, C and D categories. We may illustrate from the 
B category thus: 


B— = just about enough above average to qualify for admis- 
sion to the B category. А А 

B+ = Much above average. Very superior but not quite 
distinguished enough for the A category. — 

B = Comfortably in the B category though with no 


pretensions to A. 


З PIC 
Similarly C+ will be ‘high average’ but not quite good 


enough for B—, in fact just misses В whereas B — just 
makes it, and sp on. 
it is possible to train onese 
this kind rapidly, yet with consi 
D г 


If to make judgments of 
derable confidence, on 


34 Mental Measurement 


the basis of experience and practice. The first step in a 
particular case is to decide which of the five categories 
is the most suitable for the individual concerned, and 
then to ponder the degree of confidence with which one 
has assigned him to that category. Suppose the cate- 
gory were C. If one has felt some doubt as to whether he 
really reaches the C standard, mark him C—. If on the 
contrary one has felt quite confident of the C mark and 
even wondered whether he was not perhaps worth B, 
then mark him С--. With no special feeling either way, 
mark him just C. 

There is a further refinement which has occasionally 
been adopted as one method of translating an assessment 
of this kind into a numerical mark. In principle it is 
as follows: 

Rule a line 100 millimetres long and divide it into 
five equal parts marked, respectively, A, B, C, D, E. 
In assessing an individual proceed first as before by 
placing him in one of the five categories. Then decide 
whereabouts in this category you think he should come, 
and indicate this point by a tick on the line. Apply а 
100-millimetre scale to the line with its zero point at the 
extreme end of the E section, This distance in milli- 
metres of the tick from this Zero, measured by the scale 

‚ can be taken as a numerical mark of assessment. 

It is not uncommon for attempts to be made to relate 
р рг to the Percentage of cases in Шеп 
fall in the À a ee that 6 per cent. shoul 

“gory, 24 per cent. in the B, до per cent. 


in the C, 24 per cent. in the D, and 6 per cent. in the E- 


xe percentages, or others near to them (sometimes 
rel а Specifically to the normal curve), are commonly 
used. * 


"This may be looked at in two different ways. On the 
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one hand it may be regarded as defining the categories. 
Thus to say that an individual is A would mean that he 
comes in the top 6 per cent.; to say that he is C would 
mean that he comes in the middle 40 per cent. and so on. 
A method of applying this would be to arrange all those 
concerned in a rough order of ‘merit’, and then mark 
the top 6 per cent. A, the next 24 per cent. B, and so on. 

On the other hand a somewhat different method of 
approach is to consider each individual separately and 
to try to decide whether he comes in the top 6 per cent., 
or the next 24 per cent., or the middle 40 per cent. etc. 

Before this can be done, however, it is necessary to be 
more precise about the nature of the reference group. 
Thus suppose a teacher is assessing a pupil for intelli- 
gence or for attainment in some subject. If he is con- 
sidered in relation to the group of pupils of similar age 
in the same school it may be possible to estimate, with- 
Out special tests, the percentage-range in which he 
should be placed, though this is by no means easy. 
But if the reference group were the whole population of 
similar age the estimate would be extremely difficult 
and generally unreliable. If the qualities assessed were 
those of character or temperament the difficulties 
would be much greater in both cases. Hence it is doubt- 
ful whether such things as entries on record cards 


. а: а B > 
which require teachers to assess individuals by literal 


marks defined in terms of percentages of the general 
population are really of much value. 

The extreme case of the category method of assess- 
ment is the ‘order of merit’, inswhich each individual is 
tanked by his position in the order. It may be noted 
again here that, where an order of merit is based, not 
On assessment, but on the results of some test or other 
measure expressed numerically, the score of the middle 


a 


e 


36 Mental Measurement 


individual of the group is called the *median'. If there 
is an even number in the group the median score is the 
average of the score of the two middle members. The 
scores of the group members who come at the 25th and 
75th percentiles, that is midway between the middle and 
the extremes of the group, are called, respectively, the 
"lower quartile’ and the ‘upper quartile’. The difference 
between these scores is the ‘interquartile range’ and is 
one rough measure of spread. 

Another method of collecting data for measurement 
or assessment which is sometimes used is that of the 
questionnaire. A list of questions is addressed to each 
member of the group which is the subject of inquiry. 
Preferably the answers required should be short and 
unambiguous such as “yes’ or ‘no’, and this calls for 
some skill in the framing of the questionnaire. It is of 
course essential to the success of this method that the 
subjects of the inquiry should ‘play fair’ and record 
their answers in good faith; steps should therefore be 
taken to secure their confidence and understanding, and 

to arrange that answers are recorded anonymously where 
this seems desirable. The questionnaire method is use- 
ful in the investigation of such things as interests and 
attitudes of various kinds, e.g. educational, social, 
. Political or religious, where other methods are impractic- 
able or inappropriate, and in collecting certain kinds of 
factual data about the life circumstances of the subjects 
concerned. The data obtained are submitted to statistical 
in some cases provide a ‘rating scale’. 


to the questionnaire, and an 
» for example, of the nature 
Sts in certain directions or his 
things. 
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At this point I will leave the question of the measure- 
ment of a single variable characteristic and pass on to 
that of the relationship between the measures of two or 
more variables. "This question is dealt with in what is 
known to statisticians as the ‘theory of correlation’. 


Chapter Four 


MEASURES OF TWO OR MORE VARIABLES 
AND THEIR INTERRELATIONSHIP 


E is clear that the discovery of the nature of the 

relationships between different human characteristics 
is of the greatest importance. These characteristics 
may be physical, such as height and weight, or mental 


such as intelligence, temperament, educational attain- 
ment, and so on. 


We can, of course, investigate the relationship between 


two variables precisely only by investigating the rela- 
tionship between their numerical measures. In what 
follows I shall assume, unless otherwise stated, that the 
variables dealt with are such as may be supposed to be 
normally distributed in the groups concerned. Where 
this is not the Case certain modifications in treatment 
are required which are considered in statistical textbooks. 
determining whether two variables 
if so, what is the nature and degree of 
is to see whether they vary together. 
n height is usually accompanied by 
and both of these usually increase with 
е С say there is a relationship between 
height, Weight and age, but to find the nattire of this 


сеп height or weight alone, irrespective 
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of age, one method would be to pick a group all of whom 
were of the same age. 

Another example of relationship of a different kind is 
that between the pressure and volume of a given quan- 
tity of air or other gas kept at constant temperature. It 
is well known—and obvious—that as the pressure on the 
gas is increased the volume diminishes. Hence pressure 
and volume are related, because they change together 
but this time the relationship is a negative one for as one 
variable gets bigger the other gets less. 

All this suggests that a useful and significant method 

of indicating the degree of relationship between A and 
B, through a consideration of the way they vary to- 
gether, would be by the proportion of their whole vari- 
ances which is due to factors influencing them both, 
and this is the course commonly adopted. On the 
other hand the proportion of the variance of one which 
is due to factors which do not influence the other, is a 
measure of their degree of independence of one another. 
These two proportions together evidently account for 
the whole variance. 
_ There is a comparatively simple method of calculat- 
ing an index which expresses the degree to which two 
qualities vary together and are therefore related, the 
degree of their ‘covariation’ we might say. 

Suppose we have the marks or measures of each, 
member of a group in respect of two qualities X and 
Y. X might be height, and Y weight; or X might be 
intelligence and Y attainment, and so on. If one member 
of the group scores above the average or mean for both 
X and Y, this provides some evidence that X and Y 
Vary together. The same would be true if, the score had 

een below average for both X and Y. On the other 
hand if the score were above average for X and below 
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average for Y, or vice versa, this would provide some 
evidence that X and Y do not vary together, at least in 
the same direction. We want a method of summarizing 
the whole evidence when we consider the scores of all the 
members of the group. This is how it is done: 

Suppose one person scores (say) 3 marks above aver- 
age (+3) for X and 5 marks above average (+5) for Y. 
Multiply +3 and +5 together to give +15. Itis right 
that this product should be positive or plus as we have 
here positive evidence of covariation of X and Y. 
Suppose another Person scores 4 marks below average 
for X (—4) and 5 marks below average (—5) for Y. 
Multiply —4 and —s to give +20, Again the product 
is rightly positive, as we have here positive evidence for 
covariation. Suppose still another person scores 4 
marks above average (+4) for X, and 2 marks below 
average ( —2) for У, Multiply +4 and —2 to give =8 
Itis right that the product should be negative or minus 
as we have here negative evidence, or evidence against 
Covariation of X and Y in the same direction. More- 
over the higher or lower the scores, that is the more 


they deviate from the Mean above or below, the 
Stronger the eviden 


case i i e pro- 

duct of the deviations, for som eire (5 deviations 

oduct. f 
: erall evid | ie group 0 
RU ul er a E a 
d all the positive products together (the 
) and then add all the negative 

*r (total negative evidence) and the? 


А ength i ainst, 
Provided by cac 8th of the evidence, for or ag 
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subtract the negative sum from the positive sum. The 
result expresses the overall evidence. If it is zero, the 
positive and negative cancel out, and there is no evidence 
of relationship between Х and Y; if it is positive, there 
is evidence that X and Y vary together in the same direc- 
tton, that is as one gets larger the other also tends to 
get larger, and when one gets smaller the other also 
tends to get smaller; but if the result is negative there is 
evidence that X and Y vary together but in opposite 
directions, that is as one gets larger the other tends to 
get smaller. The greater the magnitude of the result 
the more closely do X and Y vary together. 

We commonly divide the result of summing up the 
Positive and negative products, in the way just described, 
by the number of members of the group, to get what 
might be called the 'average' or 'on-the-whole' evi- 

ence of relationship between Х and Y. This average 
result is called the ‘covariance’ of X and Y. 
E Hence to get the covariance of X and Y we proceed 
us: 

I. Find for each member of the group, with the proper 
Signs (-- or —), the deviations of his scores for X and 
for Y, from the mean scores of the group for those two 
qualities, 

Multiply these two deviations algebraically, that is 
giving the product a plus sign when both deviations аге” 
Positive or both negative, and a minus sign when one 
deviation is positive and the other negative. 

Add all the products algebraically, that is subtract the 


Sum 9f the negative products from the sun of the posi- 
tive products. . 


4. Divide the result by the number of members of the group. 
in better knówn, and hitherto more widely used, 
ex of the relationships between two qualities is 


› 
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that known as the ‘coefficient of u AN 
closely connected with the covariance and i 

found for a member of the group‏ رر 
tion of his X-score from the mean Y-score for t e E wA‏ 
divide this deviation by the standard deviation n a‏ 
X-scores of the group. Then find the pee a‏ 
Y-score from the mean Y-score, and divide it by be‏ 
standard deviation of the Y-scores. This «оер‏ 
to the first step in finding the covariance. The baee‏ 
three steps аге exactly the same as for the а S‏ 
the only difference being that we are now wor we‏ 
with the ordinary deviations, but with these em‏ 
by the corresponding standard deviation. The : e‏ 
of the fourth and last step (see above) is called e‏ 
“coefficient of correlation’. It is equal to the covarian‏ 


i D. 
divided by the S.D. of the X-scores, and also by the S. 
of the Y-scores, 


For those interested in formulae the foregoing may be 
expressed thus: 

Let N be the number 

Let x be the deviation 

from the mean X- 

Let y be the deviati 

^ from the mean 


of members of the group. pers 

of the X-score of one of the mem 

Score of the group. 

on of the еки of that same member 
Y-score of the group. 

x and y may be plus or minus. 

We then tal 


ke the product xy. call 
Add all these products together algebraically. Let us 
the sum Exy. 


Уху 


^ M н : This 
Then divide by the number in the group, to get "yr Y 


is the covariance, D 
To get the Coefficient of correlation divide & by ох, the 2 
ofthe X- 


Scores, and divide y by cy the S.D. of the Y-scores 
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Take the product E and add for all members of the 
Gy Gy 


roup ti D. 
ii 7 


Xx 


Now divide by N, to get 
correlation. No yoy 

Evidently it is equal to the covariance divided by c y and cy, 
and is usually denoted by r. 


'This is the coefficient of 


. When we express score deviations from the mean 
In terms of the standard deviation by dividing them 
by the latter, the results are said to be in ‘standard 
measure’. The more nearly the distribution of the 
Taw scores approximates to a normal distribution, the 
More nearly will the standard measures thus obtained 
approximate to the standard values in terms of c obtained 
from percentiles of the normal curve by the method 
already considered. 
he greatest possible value of 7, the correlation co- 
efficient, is 1 (+ or —), when X and Y vary together 
exactly. The least possible value of r is o, when there 
15 no relation or covariation of X and Y. When X and 
Vary together in the same direction r is positive, when 
€y vary in opposite directions 7 is negative. In all 
Cases the value of r is between o and 1, the greater the 
Yalue the closer the relation or covariation of X and Y? 
€ correlation coefficient obtained by the method 
described in the foregoing is called the 'product- 
moment’ coefficient of correlation. See Appendix I 
able IV^for an example. " * 
he value of the correlation between two qualities X 
i Y obtained from a particular group of people is, of 
tse, only an' estimate of the true value, and is there- 
ere Subject to error. The standard error of this estimate 
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of r can be found by squaring r, subtracting this square 
from r, and dividing the difference by the square root 
of the number of members in the group from which the 
value of r is obtained. Thus, expressed in.a formula, 


. I—r? 
the standard error of r is 


As we have seen the 


probable error is 6745 of this. А 

When two qualities X and У are related the proportion 
of the variance of either of them which is due to the 
influence of factors affecting both is given by the square 
of the correlation between them, that is r?. This square 
indicates the closeness of the relation between the two 
qualities, for it is a measure of what might be called the 
‘overlap’ of the two qualities. д 

For example, if the correlation between X and Y i$ 
8, the proportion of the variance of either which is due 
to factors influencing both is -8 x -8, namely :64 OF 
64 per cent. We might then say that there is a 64. per 
cent. relationship between X and Y and a 36 per cent 
lack of relationship. It is therefore important, in esti" 
mating the closeness of the relationship between wa 
qualities, to look, not at the correlation between them» 
but at the square of this. 

If we know a person’s score in regard to one of the 
qualities X, and also the correlation between X and у 
it is not possible to Say exactly what his score will be 
for Y except when the correlation r is equal to 1. When 
r is less than 1 it is, however, possible to say what the 
most probable score for Y will be if we know the score 
for X. It is first necessary to express the given raw 
X-score in standard measure by finding its deviation 
from the mean of the X-scores and then dividing this 
deviation by the standard deviation. We then multiply 
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by the correlation 7, and the result is the most probable 
Y-score expressed in standard measure. This can, of 
course, readily be converted back to the corresponding 
raw Y-score. 

е 'The most probable Y-score for a given X-score is, 
in fact, the mean or average score for Y of all those 
persons who obtain the given score for X. Dy a similar 
calculation we can find the most probable X-score for 
à given Y-score. An example of the necessary calcula- 
tions is given in Appendix I. 

If we take a list of X-scores and find for each of these 
the corresponding mean or most probable Y-score we can 
plot the results on a graph to get what is known as a 

Tegression line’. Similarly we can obtain another 
regression line by plotting Y-scores against the corres- 
Ponding mean or most probable X-scores. Hence these 
Tegression lines express graphically the relation between 
the scores for one of the qualities and the corresponding 
mean scores for the other quality. 

In most cases of mental and educational measurement 
the regression lines approximate to straight lines. The 
regression is then said to be ‘linear’. If, however, the 
ines do not approximate to straight lines the product- 
moment method of calculating relationship or covaria- 
tion is not applicable. A different, and much more 
complex, technique has to be used to obtain what 15 

nown as the correlation ratio, usually denoted by 1. 
Es we have the case of more than two related 
ented, tare” height, weight, and agec-we are pre- 
de with 2 pile in what is called multiple cor- 
bách п. We can calculate the correlation between 
ае of vgriables in the usual way, and these cor- 

Ons are in this case called the ‘raw’ correlations. 
Y а further calculation we can also find the correlation 
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between any pair of the variables when the influence 
of one or more of the remaining variables is eliminated. 
Such a correlation is called a ‘partial correlation’. 

There are two correlation coefficients of special 
importance in relation to a particular test or other 
instrument of measurement. The first is a measure of the 
self-consistency of the test, and is the correlation between 
the results of applying the test to the same group of 
people on two different occasions separated by only а 
short interval of time. Evidently if the two sets of 
results were widely different from one another no 
reliance could be placed on the test. ‘Thus the cor- 
relation between the two sets of results is a measure of 
the degree of reliability of the test in this sense, and is in 
fact called the ‘reliability coefficient’, An alternative 
method of finding the reliability is to set the test once 
only and then to find the correlation between one half of 
the test (the odd items) and the other half (the even 
items). A correction can then be applied to adjust the 
result to the full-length test. 

'The other coefficient referred to is known as the 
' coefficient of validity' and is a measure of the validity of 
the test for the purpose for which it was constructed. 
The coefficient of validity is the correlation between the 
„results of the test and an accepted criterion of the 
‘character the test is designed to assess. Thus the early 
intelligence tests were validated by comparison wit 
subjective judgments such as those of teachers. The 
children tested were ranked by the pooled judgments 9 
those who knew them well in what was considered to be 
the order of their intelligence, and this order was com- 
pared with that given by the tests and the correlation 
then computed. Again, tests for allocating children t9 
the various forms of Secondary education are validate 
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by ‘following up’ the children and observing their 
achievements after a considerable lapse of time. This 
achievement is accepted as a criterion of the fitness of the 
type of secondary education for the child concerned, 
the test having been devised for the purpose of pick- 
ing the most suitable type of education for each 
child. 
5 Evidently there is a circularity involved in validating 
intelligence tests by agreement with teachers’ estimates. 
But the point is that a test could not be accepted unless 
Its results showed general agreement with the pooled 
Judgments of teachers well acquainted with the children 
to whom the test is applied. Once the test has been thus 
Validated, however, it can be used as a rapid and easily 
applied method of estimating the ability of large numbers 
of children, and this is of great advantage, especially 
when those who have to apply the test and act on its 
results are not well acquainted with the children con- 
cerned, 
Clearly if a test has a low coefficient of reliability it 
as no significant validity. On the other hand a test 
may have a high reliability coefficient and yet not have a 
high degree of validity, for it may be self-consistent but 
Nevertheless unsuitable for the purpose for which it 
Was constructed. Thus high reliability is a necessary 
ut not a sufficient condition of high validity. E 
a Methods exist for estimating degree of relationship 
Whe expressing it by a correlation coefficient in cases 
x 5s the data are not in terms of numerical measures 
en Where it is inconvenient tg use these measures) but 
Stouped in categories or classes. 
T uch cases vary from the one extreme where we know 
Bs pu Orders of members of a group in respect of two 
acteristics but do not know, or do not wish to use, 
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their numerical measures, to the other extreme e 
all we know about each member of the group is whether 
he is above or below average in each of the two we 
acteristics. In between we have the cases where me 
members of the group are classified in a few -— 
in the way already considered in e 
In all such cases methods exist for computing a coe 
cient indicating the degree of relationship in the ae 
between the two given characteristics; but these coe р 
cients are not so accurate as the correlation cg 
computed from the numerical measures, though Е. 
may be sufficient for some purposes and in any e 
provide a quick and simple method of making am 
or less rough estimate of degree of relationship. f 
One of the best known and most useful methods d 
computing a correlation coefficient when only ra E 
orders in a group are known deserves special men i 
This is Spearman’s Method of Rank Differences. an 
coefficient found by it is usually denoted by p, an 
obtained as follows: nite 
Two lists are drawn up giving the orders in whic d 
members of the group stand for each of the two eh Е 
acteristics concerned. Let D be the difference NM 
the rank-order of any one member of the group in | 
list and his rank-order in the other list, Square D ae 
“then add all the D2 together giving the sum XD 


in the 
all the members of the group. If N is the number in th 


group, р is given by the formula 


_ 6xp : 
; N (N? — 1) 


= ү 


> г this 
Even when numerical measures are available 
method enables one to 


0 
: get a rough estimate of degre? ^ 
relationship Yety quickly. "The most probable value 
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7, the ‘true’ correlation corresponding to р, can be 
calculated and tables exist for converting p to r. r is 


- always greater than the corresponding р. An example 


of the calculation of a rank correlation is given in 
Appendix I. 


Chapter Five 
THE FACTORS OF THE MIND 


HE fundamental hypothesis here is that all mental 
Aa e are due, or behave as if they were due, 
to the operation of certain factors which can be analysed 
out and recognized. ; 

The qualification ‘or behave as if they were due’ is 
important. I do not propose to discuss at length here the 
question of the status of mental factors. It is enough to 
say that the position in this case is analogous to that 1n 
the physical sciences where we not infrequently speak 
and calculate in terms of factors which may have no 
kind of concrete existence at all and in some cases cer- 
tainly have not. They are convenient fictions which are 
of help—often great help—in co-ordinating our data and 
in simplifying our calculations. Whether mental factors 
have any kind of ‘real’ existence and, if so, in what sense 
is a question irrelevant to the present discussion. The 
purpose of mental measurement is to obtain certain 
kinds of data and to deal with these by a technique which 
shall be fruitful and effective in determining practical 
procedures for achieving certain ends mainly in the fields 
of education, vocation, and health in the fullest sense. 
the factor symbolism makes a valuable contribution tO 
this technique, that is a good and sufficient reason for 
adopting it. The philosophical question as to whether 
the factors symbolized ‘really exist? is another matter 


altogether. These considerations should be borne іл 
mind in all that follows. 
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The basic data from which factors are derived are 
provided by a table giving the intercorrelations among a 
number of particular tests set to a particular group of 
people. In other words factors first arise only in rela- 
tion to particular tests and particular ‘testees’. This at 
once raises a fundamental point. It is clear that if we 
are going to get different factors every time we use 
different tests, or the same tests applied to different people, 
the factor technique is not going to be of much help. 
The only real justification for this technique would be if 
We were to find some factors, recognizably the same, 
turning up time after time in the course of our applica- 
tions of all kinds of tests to all kinds of people. There is 
evidence that something of this kind is happening, as we 
Shall see. 

у There are two main kinds of factors, ‘common’ and 
Specific’, defined in the first instance in relation to a 
Blven set of tests and testees. A common factor is one 
Which influences the results of more than one of the 
tests. If it affects all the tests it is called a ‘general’ 
factor; if it affects some but not all of the tests it is called 
а group’ factor. 

A ‘specific’ factor is one which affects only one of the 
tests and has therefore a very narrow range. A true 
Specific is determined only by the character of the tests 
d there is also an ‘error’ factor peculiar to the test in 
gestion, It is extremely difficult, if not impossible, to 

Parate completely the true specific from the error 
us and in practice they are often lumped together in 
= Course of the analysis as something specific to the 
а E: Will be apparent from the above that, as there is 

ays a factor specific to each test, and almost always 
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and the people to whom it is applied, but mixed up with“ 
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common factors as well, there will be, in the factor- 
analysis of a set of tests, at least as many factors as tests 
and nearly always more. Again, then, it may be asked 
why we should use factors at all—why not use test 
results direct. 

As before the answer is that factorial technique may 
well prove to be worth while if we find the same common 
factors turning up frequently in different sets of tests 
applied to different groups of people. Such common 
factors would provide a sort of currency through the 
medium of which we could not only relate different sets 
of tests and different groups of people to one another, 
but also perhaps relate test results to types of education 
and vocation. Thus, for example, if a given occupation 
could be shown to require certain recognizable mental 
factors identical with some of the common factors We 
had discovered, we might by using tests the results of 
which were known to be strongly influenced by those 
factors (tests ‘highly saturated’ with the factors) be able 
to pick out those individuals most suited for the occupa- 
tion in question. I shall return to this important point 
later. 

After factor analysis has been applied to a set of tests 
simple equations can be framed which determine, for а 
given set of factor values, the corresponding test scores. 

“The score in each test is, in the corresponding equation 
expressed as a simple sum of the factor-values, each 
multiplied by a coefficient known as the ‘weight’ ОГ 
"loading? of the factor for the test, thus: 


x = w F # w Е, + w, F, + etc. 


where x is the score in one of the tests, Fj, Fy, Fs, etc 
the corresponding factor values, one factor-term being 
the specific, and Wi, Wa, 03, etc. the loadings. х of course 


The Factors of the Mind 53 


varies as F,, F, etc. vary, but the coefficients w, ws, 
ete: remain constant in the equation. They indicate the 
Saturation’ of the test with the corresponding factors, 
and also the proportionate influence of each factor on the 
variance of the test, the proportion being equal to the 
Square of the factor loading. The values of all the vari- 
ables in the equation are expressed in standard measure. 
Incidentally in the kind of investigation with which we 
are concerned there are rarely more than three or four 
common factors of significant weight, and usually, 
though not always, we analyse into factors which are 
uncorrelated with one another, and are then said to 
form an ‘orthogonal set’. 

Unfortunately there now arises another serious diffi- 
culty, for it transpires that there is an unlimited number 
of ways in which a given set of tests can be analysed in 
terms of sets of factors, orthogonal or otherwise. 

Perhaps the simplest way of seeing how this comes 
about is by means of a geometrical analogy which is 
much used in factor analysis. In plotting graphs we 
8enerally refer the points and lines on the graphs to 
Certain fixed straight lines known as ‘axes’ which are 
Commonly, though not always, at right angles. In 
‘Xing these axes we can, of course, take them where we 
like, and our choice will depend on considerations of 
Convenience and the purpose for which we are plotting 
the graph. Now scores on a test can be represented on a 
aes by points along a line. There will be a different 

€ for Sach test but it will start from the same point. 
зеи Point represents for each test the average ог mean 

made in the test by the members of the group we are 
cm itis,called the origin and is usually denoted 
vd i Scores one way along a test-line (or ‘test-vector’ 
Elve, it its proper name) from О will be above average, 
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and scores the other way below average. The axes to 
which the points on the test-lines are referred, which 
also pass through O, represent factors into which the 
tests may be analysed. When there are more than two 
or three of these factor-axes the plot cannot of course 
be represented on paper, and algebraic methods have to 
be used. If the factor-axes are all at right angles to one 


Factor-Axis 


Components 


cR-------- > Factor-Axis 
Components 


Fig. 5 

the factors are uncorrelated with one another 
and form an orthogonal set, to which reference has 
already been made. A simple illustration of the foregoing 
With two test-vectors and two factor-axes, is given in 
Fig. 5. 
. Now clearly the factor-axes can be twisted or rotated 
into an infinite number of different positions. Eac 

Position gives a different method of referring the test- 
lines to the axes. Hence we can get an infinite number 
of different factor-analyses for the same set of tests, the 


another, 
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rotation of the axes corresponding to the algebraic 
process of transforming one possible factor-analysis of 
the tests into another possible factor-analysis. 

The various methods of extracting the common factors 
from a set of tests proceed by ‘taking out’ first the com- 
mon factor which, of all the common factors derived 
by the particular method, accounts for the greatest 
proportion of the total test-variance. A table of *residual 
correlations’ is then formed by eliminating the effect of 
the first factor from the original correlation table. A 
Second factor, next in order of effect on the variance, is 
then extracted from the residual table. ‘This process is 
repeated till we reach the limit of factors having any 
significant effect. As stated before we find, in general, 
not more than three or four such factors. 

It is sometimes found that certain factors have marked 
positive correlations with some of the tests in the set from 
which they are derived and marked negative correlations 
with others. In other words the presence of the factor 
tends to raise the ‘score’ in some tests and to lower it in 
Others, Such factors are called ‘bi-polar’. 

When we have extracted a set of common factors by 
one of the methods referred to we can transform it into 
alternative sets by rotation. This brings us to the crucial 
question: Which of the possible alternative sets of factors 
derived from a given set of tests shall we select as likely 
to be the most useful for practical purposes? 

Two obvious criteria of selection stand out, and have in 
fact been, fairly generally adopted; one is mathematical 
and the other psychological. We might rotate the factor- 
axes till we reached a set of factors such that the equa- 
tions connecting test scores and factor-values took the 
mathematically’ simplest form, or we might rotate till 
We came to a set of factors which seemed on the face of 
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it to have a definite and familiar psychological significance. 

With regard to the first alternative we might regard 
‘simple’ equations connecting tests and factors as those 
in which as many as possible of the common factors had 
zero weight, in other words equations expressing mea- 
surements for each test in terms of as few common 
factors as possible of all the common factors extracted 
from the whole set of tests. 

The second alternative calls for insight and imagina- 
tion, but, as we shall see, it does in fact happen that some 
factors seem to be recognizable as cognate with psycholo- 
gical processes already familiar from observation and. 
investigation in the fields of cognition, conation and 
feeling. 

This is not the place to pursue further the question of 
the selection of sets of factors. It need only be said that 
the experts concerned are pursuing their research with 
the aim, ideally, of finding sets which combine mathe- 
matical simplicity and psychological significance in the 
highest possible degree. We will, however, consider 
what has already been achieved in the way of picking out 
significant and recognizable factors. 

_ Truly general factors of recognizable psychological 
significance are few and far between, and yet it was the 
recognition of a general factor which gave the first 
“great impetus to the development of factorial analysis. 
The outstanding pioneer in this field was the late Charles 
Spearman, and his fundamental discovery was that the 
inter-correlations among a set of tests tend to fall into 
a definite order of magnitude (known as 'hierarchical 
order’) and that the scores in the tests could then be 
explained as due solely to the operation of a single 


general factor common to all the tests together with the 
specific factors for the particular tests, 
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Spearman's work lay for the most part in the cognitive 
field and the tests he used were those usually known as 
'intelligence tests'. From his results he deduced the 
principle that performance in a cognitive test which would 
commonly be said to involve the operation of ‘intelli- 
gence' was determined by two factors only, namely a 
general factor g common to all such tests, and a factor 
specific to the test. This is the celebrated ‘Two-Factor 
Theory’. 

The great volume of discussion and controversy pro- 
voked by Spearman’s theory is now a matter of history 
and need not be enlarged upon here. It is enough to 
say that, with the reservation about the status of mental 
factors emphasized at the beginning of this section, 
Spearman’s main contentions have justified themselves. 
But as a result of discussion and subsequent experiment, 
in which psychologists like Godfrey Thomson and 
Cyril Burt in this country and Thurston and Thorndike 
in America took a leading part, one important modifi- 
cation was introduced into the 'l'wo-Factor theory, а 
modification which was in due course accepted by 
Spearman himself. This was to the effect that it was 
not in general possible to explain the correlations be- 
tween cognitive tests solely by the general factor g plus 
the specifics: it was necessary to introduce other factors, 
each common to some but not all of the tests, and 
factorial analysis of the correlation tables did indeed 
yield such factors. Limited common factors of this 
kind became known as ‘group factors’ determining the 
Operation of what were called 'special aptitudes' as 
distinct from ‘general ability’ determined by g. 

There are a,number of recognizable group factors 
Which seem to turn up repeatedly and are now fairly 
Benerally accepted. -Perhaps the most firmly established 
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are a verbal factor v, a numerical factor n, and a mechanical 
factor m, with the likely addition of a spatial factor k, 
that is a factor determining capacity in the compre- 
hension and imaginal manipulation of spatial relations. 
Group factors such as these operate in conjunction with 
the general factor g to a degree depending on the par- 
ticular type of situation involved. Deficiency in g may 
be compensated for in some situations by excess in one or 
other of the group factors, and vice versa. 

Another general factor which has been identified, and 
which operates in other fields beside the purely cognitive, 
is Webb's ‘persistence of motives’ or ‘stability’ factor v. 
There is however some doubt as to the ‘purity’ of this 
factor and the same applies to another associated general 
factor, namely the ‘perseveration’ factor p. This factor 
determines the degree to which a train of activity of @ 
certain type tends to persist and, as it were, to resist 
change to a train of activity of a different type. It is thus 
analogous to inertia in physics. Evidently it is in general 
desirable that it should be present in medium degree 
for very low p is associated with the type of mind which 
cannot concentrate and carry on along one line for 4 
substantial period but flits rapidly and often inconse- 
quentially from one kind of activity to another, while 
„very high p is associated with the somewhat ponderous 
type of mind which, once started in a certain direction, 
lumbers on like a steamroller resisting all attempts t 
deflect it, as it lacks the flexibility required to make quick, 
and perhaps unexpected, adjustments even when these 
may be urgently necesscry. 

Passing to the field of emotion and temperament Burt 
has made à strong case for a general factor, e, which he 
calls * general emotionality’, Recent developments in this 
field have, however, been largely directed to the extrac- 
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tion of group factors. In this the approach has often been 
through a kind of technique different from that which 
we have so far been considering. 
If we set a number of tests to a number of persons, 
and tabulate the results, we have columns giving the 
scores of the different persons in each test and rows giving 
the scores of each person in the different tests. The 
factorial analysis so far considered has been based on 
tables of correlations between tests; that is we have cor- 
related the columns in our score tables, to find the way in 
which the relations between the scores in each pair of 
tests varies from one person to another. But suppose we 
Correlate the rows in our score table so that we find the 
way in which the relations between the scores of each 
Pair of persons varies from one test to another. We shall 
then get a table of correlations between persons, indicat- 
Ing the degree to which any two of the persons resemble 
Опе another in the pattern of their scores in the various 
tests. From such a correlation table common factors 
can be extracted by the same process as before (with 
Certain modifications, especially as regards units) though 
they will not, of course, be identical with the factors 
extracted from the table of test correlations and will have 
а different kind of significance. They are called ‘type- 
factors? to distinguish them from the test correlation, 
actors which are called ‘trait-factors’. 
hese names indicate the difference in the significance 
of common or group factors obtained respectively from 
Person-cerrelation and test-correlation. When we ana- 
Уе test correlations the factors obtained indicate par- 
ticular characteristics or traits possessed by the persons 
concerned whiçh determine their scores in the various 
tests, the influence of each trait-factor on the score on 
any one of the tests being shown by the weight or load- 
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ing of the factor in the equation connecting test score 
and factor values, the factor-loading giving the degree 
of ‘saturation’ of the test with the factor. But when we 
analyse person-correlations the factors obtained indi- 
cate types of personality pattern to which each of the 
various members of the group concerned conform in a 
greater or less degree. In the corresponding equation 
connecting test score and factor values, the influence of 
each of these type-factors on test score is shown by the 
loading of the factor in the equation. Incidentally it 
should be remembered that the'same raw test-score will 
have different standardized values in the trait equations 
and the type equations as these values are measured from 
a different zero and in different units in the two cases. 

The development of personality-type analysis is only 
in its early stages. It has, indeed, been applied in two 
different; but not exclusive, ways, a narrower and а 
broader. We may distinguish types in regard to their 
response to a particular limited kind of situation, for 
example, reaction to aesthetic objects of a certain kind, 
or—and this is in general more interesting and likely to 
be more important—we can look for broader types of 
personality-pattern determined by factors which influ- 
ence behaviour over a wide field. 

Much research has been going on in connection with 
the extraction of the broader types of personality-factor, 
a good deal of the pioneer work being due to Sir Cyril 
Burt, and recently Raymond Cattell has produced evi- 
dence for a number of such factors the most important 
of which are those determining what he calls Emotional 
Integration and Character Stability, Dominance-Sub- 
missiveness, Emotional Sensitivity-" Toughness’, and 
Surgency—Desurgency.1 


1 See his Introduction to Person 


1 ^ :hinson's 
University Library). ality, Бий, p. т58 (Hutchini 
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The names in the first three cases are self-explanatory, 
but the connotation of the last pair may not be familiar to 
some. Briefly the 'surgent' individual is the cheerful, 
active, sociable, trustful type while the ‘desurgent’ 
individual is depressed, sluggish, solitary, suspicious, 
though there are of course all degrees of variation be- 
tween these extremes on the Surgency-Desurgency 
Scale. 

Side by side with the search for type-factors there is 
the study of the relationship between type-factors and 
trait-factors. No firm conclusions have yet been reached 
on this point. Burt advanced the hypothesis that there 
is a reciprocity between the two kinds of factor such that 
the factor-values and the factor-loadings in the equations 
relating test scores and trait-factors become respectively 
the factor-loadings and the factor-values in the equa- 
tions relating test scores and type-factors. But although 
this may be so in special circumstances the contention 
that it cannot be true in general seems well-founded. 

Here we must leave this brief account of factor 
analysis and turn finally to some of the practical appli- 
cations of the technique of mental and educational 
measurement which we have been considering. 


9 ^ 


Chapter Six 
PRACTICAL APPLICATIONS 


ice are two points which repay consideration 
before beginning a discussion of the practical 
applications of psychological measurement. The first 
concerns a danger, not inherent in the processes of 
measurement themselves, but in the effects of a certain 
attitude towards them. The second point is of a dif- 
ferent kind and is concerned with the nature of the 
constant factor, if any, in general intelligence and, con- 
sequently, with the respective influences of inborn, 
hereditary qualities and environment—nature and nur- 
ture. 

For many people the conception and technique of 
mental measurement have great attractions. They tend 
to become absorbed in the study of the subject and in its 
"applications and therefore to forget that it is but а 
means to an end, and that only in certain special circum- 
stances. They, and others, not uncommonly expect 
‘Trom tests results which the tests are not capable of 
giving and which they were never designed to give. 
There is therefore a real danger that, in some cases, 
testing and other forms of mental measurement may 
tend to become ends in themselves without sufficient 
regard to their limitations and to the circumstances in 
which they can be profitably applied without harmful 
effects. 3 

One of the, purposes of what I have written in the 
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previous pages has been to bring out clearly the true 
nature of mental measurement, some of its possibilities, 
and some of the precautions that must be observed in 
using it. I will only add here that if children are given 
tests too frequently—with a consequent dispropor- 
tionate expenditure of time and work—or in unsuitable 
circumstances, or without a clear idea of the end that 
is being aimed at and an informed judgment of the value 
of that end, the educational results may be most unfortu- 
nate. On the other hand, suitably applied, mental test- 
ing of various kinds can make a most valuable contribu- 
tion to educational work; but always remember that it is 
only a tool, when you use it be sure you understand 
why and take steps to assure yourself that what you pro- 
pose to do is truly suitable for the purpose you have in 
mind, and, above all— don't overdo it. 

As regards the second point to which I referred, when 
Systematic and repeated intelligence testing first devel- 
oped the investigators were struck by the apparent 
constancy, within what were regarded as the limits of 
experimental error, of the intelligence quotient (1.Q.) 
of a particular child. It was found that, although there 
Were occasional marked discrepancies, the mean varia- 
tion between test and re-test, even when these were 
separated by an interval of years, was only about 5 per 
Cent. As a result the hypothesis of the constancy of thé 
1.Q. was proposed and enjoyed a considerable vogue until 
recent years. But as observations and experiments 
accumulated, and ‘coaching’ for intelligence tests began 
to be practised, it became apparent that the hypothesis 
of constant I.Q. could not be retained without modifi- 
Cation or reservation of some kind. 

There has béen a good deal of argument, often rather 
muddled argument, about this important point; but the 
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facts are, I think, reasonably clear. It is a matter of 
common observation as well as of scientific determination 
that there is in each individual an innate constitutional 
factor which determines the effectiveness of his behavi- 
our in situations requiring the exercise of what would 
ordinarily be called ‘intelligence’. If this were not so it 
would be possible by appropriate treatment to turn 
everyone into a genius, or at least to raise his intelligence 
to a very high level. But nothing is further from the case, 
for it is again a matter of common observation that, 
within narrow limits, the ‘bright’ person remains bright, 
and the ‘dull’ person dull. At present we know of no 
means, educational or other, of altering this state of 
affairs. 

It is of course clearly possible to interfere with the 
effective exercise of intelligence. Illness, malnutrition, 
emotional disturbance, or the deliberate administration 
of certain drugs may greatly lower performance level, 
but that is another matter altogether. It brings us to the 
next point—the influence of environment on the exer- 
cise of intelligence. 

An individual’s behaviour in any situation is deter- 
mined by his own characteristic qualities on the one 
hand, and by the environmental character of the situa- 
tion on the other. In the case of intelligent behaviour 
“the response of the individual is determined by the 
nature of the situation with which he is faced, the level 
of his innate intelligence factors, and his degree of skill 
and judgment in exercising his intelligence which will 
depend to a very large extent on the environmental 
factors, of which education is of course one of the most 
important, which have been influencing him since birth. 

The question whether heredity or environment has 
the most influence on intelligent behaviour is therefore 


Practical Applications 65 


hardly significant. Intelligence and environment do not 
supplement one another with additive effects; they are 
complementary. Intelligence can only manifest itself in 
an environment, and an environment has no meaning 
M pun regard to a responsive organism. 
" ا‎ way of looking innate intelligence 
„тера s setting а limit to what it is possible for 
the individual to perform and achieve. This limit is 
гешу never reached, for the environing conditions 
Te unlikely to be perfectly suited to the fulfilment of 
Bm but it can never be exceeded, and it varies 
n y from one individual to another. 
ae are two analogies which are, perhaps, helpful 
might ШШ" they must not be pressed too far. We 
of ee of the intelligence of an individual asa kind 
is es sent i, is significant that the word “capacity 
tainer ex connection with intelligence. This con- 
Size after the s during childhood but remains fixed in 
constitutional fron Its size is determined by hereditary 
Thay also Бе. hi actors. It may be large or small, but it 
filled to = ey or nearly full, in fact it may be 
egree to E: тергее up to the limit of its capacity. The 
actors, hich it is filled depends on environmental 
Or we m: 
ol ‘ae look on intelligence as a kind of ашыш 
eing ina] be blunt or sharp, its degree of sharpness 
imit to E um by the hereditary factor. This sets a 
‘at limit the be be achieved with the tool, but within 
Enviro me, ili with which the tool is used will depend 
‘Osu mental factors including training. 
to be oe up the potentiality of the individual in regard 
limi lour involving ‘intelligence’ is un 
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is determined in general, and on particular occasions, by 
all kinds of environmental influences. The innate factor 
may perhaps be identified with Spearman's general 
factor g, and it accounts for the great range of I.Q. varia- 
tion from one individual to another and for the tendency 
of the I.Q. to comparative stability in the case of a parti- 
cular individual. The question remains as to the reason 
why there is some fluctuation, on occasions extreme, in the 
І.О. values obtained for the same individual. 

These reasons are not far to seek. It is of course 
impossible to measure intelligence directly; one can 
only measure it indirectly by observing the results of its 
application in a particular situation. The situation which 
we devise for this purpose is the setting of an intelligence 
test. It is impossible to construct a test which is a pure 
measure of intelligence, though some tests approach 
this ideal more nearly than others; the results of the 
test will inevitably be influenced by, for example, such 
special factors as verbal, numerical and spatial aptitude 
in addition to general intelligence. The usual method 
of constructing a test is therefore to make it up of sub- 
tests requiring the exercise of intelligence through 
various media—verbal, numerical and so on—in various 
combinations, the overall result giving a reasonably 
accurate estimate of general intelligence. 

But the result of an intelligence test may be subject to 
disturbance by a different kind of cause. For physical or 
emotional reasons, or through the influence of some 
external factor, the individual may not be ina condition 
to put up thc best performance of which he is capable at 
the time he takes the test. Hence it is unwise to rely on 
a single test, especially when an important decision is to 
be made as the result of it. A number of tests should be 
set at intervals, the periods between the tests being long 
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enough to make it likely that the effects of temporary 
disturbances will be minimized. It is not uncommon 
when this is done to take the average I.Q. obtained for a 
given individual from the series of tests as the final 
estimate, but there is a good deal to be said for taking 
the maximum 1.Q. he obtains as this will be the nearest 
measure of his full capacity. 

"These considerations indicate the precautions to be 
taken in using intelligence tests and in interpreting their 
results, but they do not detract from the great value of 
the tests when properly applied. The available evidence 
shows that tests of general intelligence are affected by 
disturbing factors such as those mentioned to a con- 
siderably less extent than are other kinds of examinations 
and tests. But this leads up to matter which has been 
much under discussion of late, namely the effects of 
Coaching’ on the results of intelligence tests. 

i The results of intelligence tests, being a measure of 
intelligence in action, are undoubtedly affected by prac- 
tice. If we consider first not special coaching but the 
kind of practice which arises from the setting of tests 
from time to time for various purposes as part of educa- 
tional routine it appears that, as compared with the I.Q.'s 
resulting when children are faced for the first time with 
this kind of test, there is a small but significant rise in 
LQ. value as the children become accustomed to the 
new type of test and no longer regard it (perhaps with 
Some anxiety) as a suspicious novelty, but take it as a 
matter of course. With the steadily increasing use of the 
tests this éffect tends to fade out, as might*be expected. 
At the most it does not seem to have, on the average, 
More than about a 5 per cent. effect on І.О. values. Inci- 
dentally it may^be remarked that, to minimize it in the 
Case of children not yet fully accustomed to the tests, a 
4 à Š 
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short ‘practice’ test (un-scored) is sometimes given just 
before the test proper. 

But what about the effects of special coaching? We 
have here to distinguish between becoming used to the 
tests as the result of general practice, a point to which I 
have just referred, and the effect on skill in applying 
intelligence to the working of the tests as a result of the 
planned systematic practice which we call ‘coaching’. 

Now the effect of disciplined practice in the exercise 
of any ability, whether physical or mental, is to raise 
level of performance but only within certain limits 
determined by the innate constitutional factor in the 
individual. What is true of abilities in general is true of 
intelligence. Coaching in intelligence tests does improve 
performance and therefore raises I.Q. values, but only 
up to a certain limit for each individual; moreover, as 
testing at intervals becomes a more and more accepted 
and familiar feature of the educational scene, the effect 
of coaching is likely to approach its saturation point, 
while, here again, the effect of such special practice on 
intelligence test results seems to be less than in the case 
of other types of tests. 

In fact the evils of coaching do not lie in its effect 
on the test results. It can only improve individual per- 
formance up to a limit set by nature, the improvement 
being on the average not large, and it is indeed important 
to know this limit of capacity for each individual. Нерсе, 
up to a point, coaching may be justified. It becomes а 
danger only when, in their understandable anxiety to get 

results’, teachers devote so much time to it as seriously 
to upset the balance of the education which the children 
are receiving; and it should be remembered too that 


there is a rapidly diminishing return from coaching in 
intelligence tests. 
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, To sum up: By all means give children practice from 
time to time in working tests in order to ensure that 
each child is fully accustomed to such tests and is given 
a fair chance of producing the best result of which he is 
capable; but stop well short of the point where such 
practice not only uses a disproportionate amount of time 
but begins to interfere seriously with the general curri- 
culum and to concentrate a quite undue and unwar- 
ranted amount of attention on the tests themselves. 

I will now go on to discuss briefly certain illustrative 
Points in the application of mental and educational 
measurement in three fields, namely in the school itself, 
in the allocation of children to different types of Second- 
ary Education, and in vocational guidance. 


Chapter Seven 
IN SCHOOL 


HERE are four main ways in which the teacher can 
make use of methods of mental measurement 
namely: 


I. As an aid to classification and organization; 


In relation to ‘standards’; 


3. In the detailed consideration of individual cases; 
and 


4. For the purpose of educational experiment. 


Both intelligence and attainment tests are extremely 
useful and economical of time in effecting provisional 
classification, Intelligence tests indicate potentiality ; 
attainment tests indicate how far that potentiality has 
been realized to date. То a headmaster of a large 
Secondary Modern School, for example, faced with an 
annual intake of perhaps a hundred or more pupils, 
initial classification presents a difficult problem. He 
cannot wait till the pupils have shown their calibre before 
sorting them out into classes, and if there are many 
mistakes in his initial classification the subsequent cor- 


rection of these mistakes when, after a lapse of some time, 
they have miade themselves apparent, is likely to be 
troublesome on both 


educational and organizational 
grounds. Reasonably correct initial classification is thus 
of great importance, ў 


Records coming with the Pupils from their 
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schools are of course of some help here, but they are 
based on subjective judgments which may not always 
be reliable and which will have been formed in relation 
to variable subjective standards. School records are 
perhaps most valuable when they give information about 
special characteristics, whether emotional or intellectual, 
or about any other unusual features which are relevant. 
Apart from this they are best used as supplementary to 
objective tests. 

In making a preliminary classification it is not enough 
to assess intelligence. Owing to various causes a child's 
actual attainment may fall short, perhaps far short, of 
his capacity; and if he is placed in a class which, though 
appropriate to his intelligence, is about to take a course 
requiring a basic attainment well in advance of his own, 
he may feel out of his depth, and the difficulty and frustra- 
tion occasioned by this may seriously handicap his 
future progress. Hence it would be better to put him 
for a while in a class of less able children to give him a 
chance of making up leeway in attainment, after which 
he can be transferred to a higher class. In making a pre- 
liminary classification, therefore, it is important to con- 
sider the relation between the intelligence and the attain- 
ment of each pupil. 

I have been speaking of classification on the large 
scale, namely the sorting out of all his pupils by a` 
headmaster. But the same principles apply, with appro- 
priate modification according to circumstances, in deal- 
ing with smaller units as when a class teacher wishes to 
divide his class into suitable ‘sections’. ° 
_ The technique of mental measurement can provide 
interesting. and, valuable information for the teacher 
in regard to what are known as ‘standards’. There has 
been much argument about the proper meaning of this 
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term into which I do not propose to enter here. I shall m 
begin with assume that the teacher has a general ten E 
what he is talking about when he speaks of the standar 
of work’ in his class, and of the kind of question he Е 
asking when he inquires how this standard compares 
with standards elsewhere. . 

If a teacher wishes to know how his class-standard in 
Arithmetic or English language, say, or in some aspects 
of these, compares with the general standard for the 
country he can, of course, only discover this on the basis 
of some performance by his class in a test which has 
already been applied to a group of children forming a 
sufficiently representative sample of the whole country. 
This at once commits the teacher to a definition, namely 
that ‘standard’ is what is assessed by score in the test In 
question. In fact instead of talking about “comparison 
of standards’ it is far better to talk about ‘comparison of 
performances’ in certain specific tests. We are then deal- 
ing with actual facts the observation of which may pro- 
vide the teacher with very useful information in regard 
to the kind of results he is trying to get from his class. 
Performance in tests is, of course, related to the somewhat 
vague idea we have in mind when we speak about 
‘standard’ but never mind that. Let the teacher con- 
centrate on performance and the information the tests 
‘can give him, information which, in value and in kind, 
will vary with the individual teacher and with the cir- 
cumstances in which he is teaching, 


By devising objective tests himself—that is tests made 
up of questions which can clearly be marked right or 
wrong according t 


9 a key independent of the marker— 
and applying these tests to successive classes while 
keeping records of the results, the teacher can judge how 
performance is varyi 


ng from one year to another. But 
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when he is using standardized tests to compare the per- 
formance of his pupils with that of children in the country 
at large, he will be well advised, before coming to 
any conclusions, to test the intelligence of his pupils also. 
In general the important question for a teacher is, or 
should be, not ‘How well do my pupils perform?', but 
'How does their performance as regards educational 
attainment compare with their capacity as regards intelli- 
gence?’. But this brings us to the third use of educational 
measurement for the teacher, namely in the considera- 
tion of individual cases. 

Cases are not uncommon, in the experience of teachers, 
of children whose general behaviour appears to indicate 
x degree of intelligence which is not matched by their 
educational progress. The lag in attainment may be 
general or it may be limited to certain particular fields, 
such as those of number or langyage. 

Here a word of warning should perhaps be given. 
The impression created by a child’s behaviour and 
demeanour may be misleading. The cheerful bright- 
looking, willing child is apt to give the impression of a 
degree of intelligence which he may not in fact possess. 
Conversely the dull-looking perhaps rather sullen, unco- 
operative child may be quite intelligent. It is therefore 
unwise, in assessing a child’s probable level of intelli- 
gence, to rely too much on subjective impression in the 
absence of objective evidence. 

In cases of doubt, therefore, much useful information 
may often be obtained by applying objective standardized 
tests of intelligence and attainment. If there is a sub- 
stantial difference of performance level in the two tests 
there is a clear case for further inquiry. In comparing 
the performances it used to be customary to work in 
terms of 1.Q.’s and E.Q.’s (educational quotients) 
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calculated by the usual method of ratio of mental age 
to actual age. The drawback to this was that if the 
spreads (S.D.’s) of I.Q.'s and E.Q.’s given by the respec- 
tive tests were different —and they generally were, E.Q.’s 
tending to be spread less than I.Q.’s—the results might 
be misinterpreted. Thus an E.Q. might be numerically 
lower than an I.Q., but when both were expressed in 
terms of their respective standard deviations the dif- 
ference between them might be much less. Hence in 
making comparisons of this kind it is safer to work in 
terms of standard scores than in terms of I.Q.'s and 
E.Q.’s calculated from mental ages. It is true that, 
statistically, there must be as many children with 
standard І.О. greater than E.Q. as there are with E.Q. 
greater than I.Q., just as statistically there must always 
be as many children above average as below average; 
but this does not nullify the significance of a differ- 
ence between the standard I.Q. and E.Q. of a parti- 
cular child on the basis of comparison with his fellows. 
Ideally the E.Q. of each child should equal his I.Q., 
and the apparent anomaly would then disappear. 
Meantime the child with E.Q, greater than I.Q. is above 
the average, and a child with E.Q. less than Т.О. is 
below the average, of his fellows in the degree to which 
his natural gifts have blossomed educationally. 

At this point it might be objected that if certain causes 
are retarding educational Progress the same causes may 
well depress performance in an intelligence test so that 
Parsons may be unfruitful or positively misleading. 
This objection is valid in principle, though in fact it 
appears that intelligence test performance 38 in general 
much less affected by disturbing causes than is educa- 
"Er MU ue But in any case the objection is hardly 

Te tor one is only concerned with the situation 
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in which a significant difference is apparent between the 
two performances, and whatever may be the cause of this 
difference it is clearly a matter for further inquiry. 

When the teacher has established to his satisfaction 
in the case of a particular child that there is such a 
significant difference the causes may be looked for in a 
number of directions. Progress may lag behind capa- 
city for health reasons, or because of some special 
mental or physical defect, or for emotional reasons arising 
from a variety of possible causes. Diagnosis is difficult 
here, and in pursuing his inquiries the teacher will do 
well to enlist the help of those experts who are specially 
qualified to co-operate with him in such inquiries. But, 
so far as the teacher himself is concerned, the value of 
the techniques of mental measurement is that, properly 
used, these techniques enable him to come to a decision as 
to when there is a case for further inquiry in the interests 
of the child. As an illustration I have discussed only 
one kind of situation in which mental measurement can 
be used as a starting point, but there are, of course, many 
others. 

A fourth way in which the teacher can make use of 
mental measurement is in connection with educational 
experiment and research. A typical example is that of 
the comparison of the effectiveness of diflerent educa- 
tional methods. Without standard objective techniques 
for estimating and recording results such a comparison 
is apt to be futile. It is not usually easy for one teacher 
alone to carry out research of this kind; it can be more 
effectively organized by a group of teachers possibly in 
the same school, possibly in different schools. A kind 
of research more suitable for the individual teacher work- 
ing alone is that relating to such things as the growth 
of individual children in capacity and attainment over a 


76 Mental Measurement 


period of time, the development of interests and attitudes 
in the individual, and the differences in these respects 
between different groups of individuals, including groups 
of opposite sex. Such studies are substantially helped 
by measurement techniques but they also require skill, 
experieiice and judgment. 

I have already spoken of the principles of educational 
research! and I shall not pursue the matter further here 
beyond emphasizing again the importance, in such 
research, of taking suitable steps to ensure adequate 
controls and to guard against misinterpretation of the 
significance of results. I have dealt with these points 
in the previous passage referred to and will now pass 
on to a second field in which the techniques of mental 
measurement have become more and more prominent. 


1 See pp. 28 ff. 
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Chapter Eight 


ALLOCATION TO DIFFERENT TYPES OF 
SECONDARY EDUCATION 


Ws must first consider the circumstances in which 
this allocation now takes place. Children are to 
be allocated to the various types of secondary education 
—Grammar, Technical, and Modern—according to 
: their ‘age, ability, and aptitude’ in the words of the 1944 
Education Act. Hence tı + selection should no longer be 
regarded as in the nature of competition for some 
‘superior’ type of secondary education, but as the means 
of embarking each child on the course of secondary 
education most suitable for him, from which he is there- 
fore likely to get the greatest amount of satisfaction 
and profit and the best preparation for his future life 
and work, Unfortunately public opinion is still far from 
adjusting itself to this new conception, but some of the 
Points raised in the sequel indicate ways in which this 
adjustment can be encouraged. . 

We may identify ‘ability’ with the general intelligence 
factor and ‘aptitudes’ with group factors of the kind 
already mentioned. The first determines the general 
level of the child’s potentiality, the second his general 
bias, in particular whether he, is more interested and 
able in the manipulation of symbols or in the manipu- 
lation of concrete material, and so whether he is more 
fitted for a ‘Grammar’ or a ‘Technical’ education if his 
8eneral level is high enough. 
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We have therefore to carry out two main selection 
processes, one in regard to general ability and the other 
in regard to special aptitude. The first is the more 
important, at least in present circumstances. 

A child’s level of general ability determines whether 
he is fitted for what I will, for convenience, call a 
‘higher’ (that is more advanced) or a ‘lower’ type of 
Secondary education. As our educational system is at 
present organized ‘higher’ connotes Grammar or Tech- 
nical and ‘lower’ connotes Modern. When we have 
drawn our line across the list of children in order of 
general ability, separating the higher from the lower, 
we then have to decide whether those in the ‘higher’ 
section of the list should go to Grammar or to Technical 
education. There is no similar problem for the lower 
section of the list as the Modern school does not dif- 
ferentiate in general between the ‘academic’ and 

practical’ types though it does of course make suitable 

adjustments to meet individual needs whenever possible. 
But in any case most of those in the lower section of the 
LQ. list are more at home with practical methods of 
learning and the manipulation of concrete material than 
with the more abstract verbal and numerical symbols. 

Important questions arise in connection with selection 
on grounds of general ability. I shall not describe or 
discuss at any length here the t 
these are well known and have 
in various contexts in the for 
are now frequently em 


echniques employed as 
already been considered 
egoing. „Intelligence tests 


quer ployed, usually with attainment 
tests (often limited to Arithmetic and English) of the 
objective type. Teachers’ estimates are sometimes 
brought in, though it has not yet been decided how best 
to use these, and some ki 


А nd of ‘interview’ may be 
included. SION. 08у 
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The points I will consider arise out of the technique 

and seem to me to be of great importance. The first 
thing to note is that azy selection technique which is 
reasonably appropriate prima facie will pick out, with very 
few exceptions, those who certainly ought to go to Gram- 
mar or Technical schools and those who certainly ought 
to go to Modern schools. The main difficulty comes in 
regard to the allocation of the children in the border- 
zone between these, and experts have been using all their 
ingenuity in attempts to devise more precise methods 
of sorting out these children. 
_ What I shall suggest is that the time and labour spent 
in such attempts to refine selection methods may well be 
misdirected and might be more profitably spent in 
attacking what I believe to be the real source of difficulty 
here. 

There are two kinds of reason why attempts to refine 
still further existing methods of 11 + selection for general 
ability may be misdirected. The first is that the errors 
Inevitably inherent in the circumstances of allocation, 
apart from the details of selection technique, may quite 
likely themselves be greater than those errors arising 
In the technique itself as it now is. If that is so it is 
largely waste of time and effort to try to diminish errors 
arising from the technique while equal or greater errors 
are necessarily left unaltered. Such errors may arise in 
Connection with the general principles of test construc- 
tion. Thus, for example, the sampling on which the 
Standardization óf the tests is based may be inadequate 
So that {Не group tested for this purpose? may not be 
sufficiently representative. On the other hand errors 
May arise in the way in which the results of the tests are 
applied by thoge who have to act on them. With the 
best will in the world such errors are bound to creep in 
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just as they do, though to a less extent, in the practical 
application of even the most precise and rigorous 
sciences. 

The second kind of reason goes deeper. The children 
in the border-zone will be fairly closely bunched and 
human nature is anyhow very complex. It is therefore 
highly likely that, in the relevant respects, these children 
are strictly undifferentiable as regards allocation to a 
higher or a lower form of secondary education. That 
is one cannot in general say, in the individual case, 
Whether a Modern or a Grammar (or Technical) school 
would be more suitable. In other words the child 
might just as well go to either, with one reservation 
which strikes at the root of the matter. 

The reservation is that this child will find something 
suitable for him whether he goes to a ‘lower’ ога 
‘higher’ school. The basic psychological fact is that the 
gradation of intelligence or general ability in children 
is continuous; there are no discontinuities which would 
provide convenient and significant divisions enabling 
us to sort out children into well-defined groups for which 
a limited number of distinctive types of education could 
be provided. 

It is of course impossible to meet this situation com- 
pletely by a perfectly flexible and continuously variable 
system of secondary education, Material, human, and 
administrative factors limit the number of schools and 
teachers available apart from anything else, and compel 
us to organize in terms of a relatively small number of 
types of eduéation. We have in fact chosen three types 
based on certain wide and general differentia. But two 
things follow. The first is that the respective amounts 
of ‘higher’ and ‘lower’ provision is not, and cannot be, 
finally determined by the nature of the distribution of 
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intelligence. It is determined by other factors such as 
resources available, convenience of organization and 
administration, and the desirability of leaving the 
Modern school with a sufficient number of children above 
average in intelligence to provide a satisfactory kind of 
community life in school and to keep up the morale of 
all concerned. 

The second consequence is that, although the educa- 
tional course for the average pupil in the ‘lower’ school 
will rightly differ considerably from the course for the 
average pupil in the ‘higher’ school, there should— 
unless we are to fly in the face of the psychological facts 
to our cost—be continuity in type of education between 
the top end of the lower school and the bottom end of 
the higher school. If this is secured, as it can be, the 
child in the border-zone will find an education suitable 
for him whether he gets into the lower school or the 
higher school, and will not be educationally handi- 
capped if allocated to the former. 

But equality and continuity of educational opportunity 
as between higher and lower schools is only half the 
story. The other half is equality and continuity of 
vocational opportunity. One of the main reasons for 
the present intense competition to get into the higher 
schools is that parents and children alike see that many 
occupations are virtually closed to anyone who has not 
been to a higher school. 

This is all wrong. It is true of course that the top- 
ranking professions require qualities of intellect and 
attainment which will, quite naturally and properly, only 
be found in children receiving the kind of education 
given towards the top end of the higher school. But there 
are many occupations for which pupils from the lower 
end of that school and pupils from the top end of the 
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lower school would be equally suitable. Relative suita- 
bility in individual cases would then depend on расе 
other than the purely intellectual and all should bave t he 
same chance. In other words attendance at some oe 
cular type of school should never, in itself, be regarded as 
a necessary condition of entry into any occupation. 
Other necessary conditions for entry to some occupa- 
tions may only be possible of fulfilment by attendance at a 
higher school, but this is just because they require 
qualities which would, without doubt and not as a border- 
zone case, result in the allocation of the pupil to that 
type of school. But for the rest everything should be 
done to change the attitude of employers and others 
who require attendance at a higher school when there 
may be many pupils of the lower School quite suitable 
for the occupations concerned. 
If equality and continuity of educational and voca- 
tional opportunity could be secured between different 
types of school many of our problems would be solved, 
and, in particular, the inevitably rather futile attempt 
to make more precise allocation would be rendered un- 
necessary. The time and energy now devoted to attempt- 
ing such refinements might more profitably be spent in 
working for a better understanding of what, in the light 
of the facts, should be the nature of the relationship 
between higher and lower schools, 

The allocation of the abler children to Grammar or 
Technical secondary education presents a different type 
of problem to which mental measurement can make some 
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sharp differentiation into types. There is continuity of 
variation in the bias from the one pole or extreme to the 
other, but it is a rather different kind of continuity from 
that apparent in the case of general intelligence. We 
might look upon the continuous variation of intelligence 
as analogous to variation in height-level; but the con- 
tinuous variation in degree of bias one way or the other is 
more analogous to the continuous variation of colour in 
the spectrum from the red at one extreme to the violet 
at the other. 

Progress in devising tests to assess Grammar-Tech- 
nical bias has been comparatively slow and I shall not 
discuss it here; but it seems clear from results already 
achieved that the technique of mental testing will have 
Some contribution to make when the best method of 
applying it has been discovered, though it is not yet 
Possible to see clearly the nature and extent of this 
contribution. - J 

One point that has emerged—not unexpectedly—is 
that there is a concentration about the middle part of 
the range of bias; that is, on the spectrum analogy, 
Most people are ‘yellows’ or ‘greens’, substantially 
fewer are ‘oranges’ or ‘blues’, and only a small number 
are ‘reds’ or ‘violets’, In other words the majority of 
children have no very marked bias one way or the other. 
A fair-sized group have a distinct bias, but only very few 

ave so extreme a bias that they would be almost entirely 
unable to profit by a secondary education with the oppo- 
Site bias. _ Е 

It is therefore important that our higher secondary 
education should be planned in such a way as to provide 
4 wide measure of flexibility and of continuity between 
the types of courses offered, and not dichotomized into 
Pus sharply distinguished types of education. As far as 
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possible each pupil should be able to pick that combina- 
-tion of courses most appropriate to the particular mix- 
ture in his make-up. This is one of the main arguments 
for the so-called ‘bi-lateral’ Grammar- Technical school, 
and it is a weighty one; but here I have been concerned 
only to state the psychological facts which have to be 
accepted and dealt with as effectively as circumstances 
permit. 
Finally I will briefly consider a third field in which 
mental measurement is likely to make a valuable contri- 
bution, that of vocational guidance. 


Chapter Nine 
VOCATIONAL GUIDANCE 


г main factors determining the occupation ог 
group of occupations for which an individual is 
most suitable are his general intelligence, his special 
aptitudes, and the nature of his particular interests and 
attitudes, Objective tests can do much to help in assess- 
ing the first two while other techniques already referred 
to, such as the questionnaire and personality rating 
Scales, may contribute to knowledge of the latter. Hence 
there is prima facie reason to suppose that mental mea- 
surement could make a substantial contribution to the 
solution of the problem of vocational guidance. 

It should be remembered that suitability for an occu- 
pation is relative. In regard to an individual jobs do 
not divide up into those for which he is completely suit- 
able and those for which he is quite unsuitable. He is 
just more suitable for some jobs than for others though 
the range of suitability will be, in general, from jobs 
for which he is very well fitted to those in which he is 
almost certain to be a failure. 

, The ideal situation would be one in which we could 
discover the nature of the qualities required for particular 
jobs covering a very wide rangę, devise tests and other 
methods to assess those qualities and then apply these 
tests to the individual, hoping as a result to be able to 
arrange all the jobs considered in order of his suitability 
for them, 
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Attempts have from time to time been made to do this. 
Job analyses have been made and, as a result, large 
batteries of tests have been assembled—and in this con- 
text I will, for brevity, use the word ‘test’ to cover all 
the methods of assessing the qualities involved. The 
tests in the battery have been suitably weighted to 
provide equations which will give the best predictions 
of degree of success in particular jobs. The computa- 
tions involved are so lengthy and complex that automatic 
computers are employed. When the individual has been 
tested by the battery his scores in the various tests are 
entered on a card to be put in the machine which then 
proceeds to indicate his degree of suitability for the 
various occupations in terms of some such measure as 
percentile rank. 

A process of this kind has 
could be carried out satisfa 
able, for it would 
individual 
would hav 


its attractions and, if it 
ctorily, would be most valu- 


he employ- 
he attempts 
e, been dis- 
prising for there are at least 
mely in the difficulty of the 
the process starts, in the 
€ qualities which the job- 
contributing to success, and 
ests to individuals. Conse- 
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can come to a definite conclusion as to the nature and 
the extent of the contribution which the tests can 
make. 

One question of interest arises here. Shall we attempt 
to relate the individual to the job directly through tests, 
or indirectly through psychological factors of the kind 
we have already discussed? In the second case it would 
be necessary to factorize both the tests and the criteria 
resulting from the job analyses in terms of the same 
factors. "The results of the tests would then show to what 
degree the individual possessed the factors required for 
success in a particular job. 

At the moment this question can only be put—we do 
not yet know the final answer. It will depend largely on 
something which I have already mentioned in another 
context, namely whether we find that, in general, we 
continue to get different factors with different tests and 
different groups of people or whether at least a certain 
number of factors turn up as recognizably identical in 
different cases on different occasions. It would evidently 
simplify vocational guidance and other applications of 
tests considerably if we were in fact able to ‘isolate’ a 
number of factors as identifiable in this way. 

Meantime three points seem to emerge fairly clearly 
from investigations to date. The first is that no amount 
of testing and personality assessment can give as reliable 
information about suitability for a job as actually giving 
the individual a.trial run in the job or in some situation 
approximating to it as nearly as possible. This is hardly 
surprising, but it is of coursesdifficult to find sufficient 
time and opportunity to carry out trials of this kind 
especially in vjew of the loss when the results are un- 
favourable. But attempts at progress on these lines is 
indicated, while at the same time search is made for 
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tests of higher predictive power with their great advantage 
of economy in time and labour. 

'The second point is that it seems easier to find the 
reasons for failure than the conditions for success. This 
is perhaps because, while the combination of a number 
of qualities may be necessary to success, one serious 
deficiency alone will lead to failure. A good deal of 


information has been obtained by investigating the cases ` 


of men and women who have had to leave jobs with which 
they could not cope. This information is valuable 
because it may at least render it possible to discover the 
occupations for which the individual is definitely un- 
fitted, and this is a useful step forward. Moreover some 
of the specific deficiencies responsible for failure may be 
detected by testing, so that warning is available in 
advance. 
The third point, which overlaps with the second, is 
that in certain important respects there is a minimum 
in the degree to which the relevant quality is required 
below which failure is almost certain to result. In 
other words there are lower limits and it is perhaps here 
that testing can play its most important part in vocational 
guidance. 
[ The relevant considerations are perhaps most evident 
in regard to general intelligence. Obviously an I.Q. 
of high order is required in, say, the holder of a chair at 
a university. No matter how excellent his qualities in 
other respects a professor could not carry out his duties 
effectively if he fell below a certain level of intellect. 
At the other éxtreme many unskilled and semi-skilled 
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dition of suitability for various types of occupation. But 
the limits are not, and cannot be, rigidly drawn. As in 
the case of suitability for various types of education there 
will be border-zones of І.О. within which comparative 
success or failure in a given. occupation will be finally 
determined by qualities other than general intelligence. 
What has just been said is true of all the qualities rele- 
vant to work at a particular job. There are limits below 
which the level of such qualities cannot fall without 
failure resulting; but it will generally be more difficult 
to fix these limits than it is in the case of general intelli- 
gence, It is here that the various techniques of mental 
Measurements in all fields can play an important part. 


Chapter Ten 


CONCLUSION 


\ Л ГІТН this consideration of some practical applica- 

tions I will conclude. As I suggested at the 
beginning of this book mental measurement has a part 
to play of the first importance, not only in developing 
psychology as a science, but also in enabling psychologists 
to treat human beings, individually and in groups, with 
deeper understanding and with results of benefit to all 
concerned. This contribution to human welfare will be 
the greater if the true nature and significance of measure- 
ment techniques, and above all their limitations, are 
constantly borne in mind. 'The understandable fears 


of some at the rapid growth of the new science will then 
no longer be justified. 


go 


Appendix One 
BASIC STATISTICAL CALCULATIONS 


Fon brevity and simplicity a small group of ten (N=10) is 
used in the following to illustrate the calculations, but it must 
be remembered that, in general, for valid statistical conclusions 
much larger, and suitably representative, groups must be used. 
The principles of calculation are, however, exactly the same 
for large groups as for small, though the actual computation 
can often be abbreviated in the case of the former by splitting 
the whole group into small sub-groups or classes and treating 
these as individuals. 

Table I 

MEANS 

(See pages 14 ff.) 

The letters A, B, C etc. in the first column represent the 
members of the group. In the columns headed X and Y are 
their actual marks or measures—the 'raw scores'—in respect 
of two qualities or performances .X and Y. 


X Y 
A 63 58 
B 60 65 
[9] 22 34 
D 12 32 
E 52 70 
F 38 41 
G sa 76 75 
H 80 58 
T 51 “ 36 
K = 46 41 


Sum(ZX)—5oo Әит (ХҮ)= 510 
УХ 500 ZYN 510 
Mean (27) = 95 Mean (5) 
9t А 
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Table II 
MEAN DEVIATIONS AND STANDARD DEVIATIONS 
(See pages 16 ff.) 


The column headed х gives the deviations (plus or minus) 
from the mean in the case of X-scores. The column headed 
x? gives the squared deviations. 


i x? 


A — 63-50 = +13 169 
B 60-50 = +10 100 
c 22-50 — —28 784 
D 12-50 = —38 1444 
E 52-50 = +2 4 
F 38-50 = —12 144 
G 76-5о= +26 676 
Н 80-50 = +30 goo 
I 51-50= +r I 
K 46-5о= —4 16 


Sum (ignoring -+ and —signs) 
=164 
Mean Ух Mean s 
(NN 2 quare 
deviation x)= 164 deviation, or (28) = 225,58 
variance; \ № ка 


Sum (Zx?)=4238 


Root mean square (R.M.S.) 
or standard devi- 
ation (сх) = AV 423:8—20:59 
The mean and standard deviations of the Y-scores can be 
calculated as an exercise, They will be found to be 14-4 and 
15:45 (су) respectively. 
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Table III 
RANK ORDER, PERCENTILES, PERCENTILE RANKS, AND G-VALUEPS 
(See pages 20 ff.) 


In the columns headed X and Y the members of the group 
are arranged in order of their raw X and Y scores, the numbers 
in the brackets giving these scores. The next column gives the 
percentiles corresponding to the scores, and the next the per- 
centile ranks of the members of the group listed under X and 
Y. In the last column are the c-values corresponding to the 
P.Rs. These are obtained from a table (see page 21) a 
shortened version of which is given in Appendix II. 


X Y Percentiles Р.К) | o-values 
т H (8) G( 9 9 9 
а С (76) Е (7о) 8о 85 +1'03 
3 Абу В (6) 7 7 te 
d B (60) H (63) 60 65 4-039 
$ Et) AG) m @& Jes 
6 I (n) FO) e 45 9x 
7 RGS Kaz) 3 35 7299 
8 F (38) I (38) 20 25 —0:67 
9 C (22) C (35) 10 I5 —r03 
0 D (12) D (24) — 5 —r64 


For example, the percentile of I's X-score of 51, and F's 


Y-score of 42 is до, as four members, or 40 per cent., of the 
group fall below this score. Therefore I's P.R. for X, and F's 
Р.В. for Y is 45, midway between the percentile for their 
Scores and the next highest percentile (50) in the list. The 
Corresponding o-value is found from the table mentioned 
have to be —o-13, that is ‘13 ofthe standard deviation of X 
(in the case of І) and -13 of the S.D. of Y (in the case of F) 
below the Inéan, 

. Again the percentile of A's score in X and B's score in Y 
is 70, A's P.R. in X and B's P.R. in Y being 75. The cor- 
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responding o-value being +-0-67, i.e. -67 of the S.D. above 
the mean. 

The numbers in the brackets on the same level give cor- 
responding scores in X and Y. 'Thus a X-score of 46 is 
equivalent to a Y-score of 41 and a X-score of 76 is equivalent 
to a Y-score of 7o. 

Suppose we decide to express this equivalence by standard 
scores based on a mean of 100 and a S.D. of 15. "Then I's 
X-score of 51 and F's Y-score of 42 are both translated into 
a standard score of 100— (13 x 15) = 98-05, and A's X-score 
of 63 and B's Y-score of 66 are both translated into a standard 
score of 100-- (67 X 15) = 110-05. It will be seen from 
"Table III that the median score in X (P.R. 50) is midway 
between 51 and 52 ie. sr:s. The lower quartile (L.Q.) 
score (P.R. 25) is 38 and the upper quartile (U.Q.) score is 
63. The interquartile range is 63 — 38 = 25 and the semi- 
interquartile range (S.LR.) is 25 = 12:5. 

The corresponding figures for Y are 
Median = so, U.Q. = 66, L.Q. = 38, LR. = 28, SIR. = 14. 


—— 
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Table IV 
PRODUCT-MOMENT COVARIANCE AND CORRELATION 
(See pages 4o ff.) 
The columns headed x and y give the deviations of the raw 
Scores in Х and Y from their means—see Tables I and II. 


The column headed ху gives the algebraic product of these 
deviations, Remember that the mean Y-score is 51 and су 


= 1545. 


* y xy 
A +33 +7 +91 
B +10 +15 +150 
С 28 16 +448 
D —38 —27 +1026 
E +2 +19 +38 
F —12 —9 +108 
G +26 +22 +572 
H +30 +12 +360 
I I 13 13 
K —4 —10 +40 


Algebraic sum = 2833 — 13 = 2820 (Zxy) 


Covariance C) = 2850 = 282 
N 10 


The correlation coefficient (rxy) 


Zxy 282 
Noyoy ^ 2059 X 15945 — 
os "yy We can estimate the most probable "Y-score when 
of еге is given, and vice versa. 'Thus suppose a member 
sco ¢ group had scored (say) 55 for X. This is a deviation 
= (x) of 55 —"so (so is the mean X-score), namely 4-5. 
n his most probable deviation score (y) in Y would be 


^ 


= 


. 
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uns Seek м ees 
that is -87 x ер X 15°45 


= 35325 
Hence the most probable raw Y-score would be 51 (the mean 


of Y) + 3:325 = 54:325 or 54, as the actual score must be a 
whole number, 


The standard error of түү is 
I— yy  1—(-87)2 
Vive Xi uoc about .08 
The probable error is 16745 X -08 = ‘054 
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Table V 
RANK CORRELATION BY DIFFERENCES—SPEARMAN'S METHOD 
(See pages 48 f) 
The columns headed X and Y give the respective rank order 


of the members of the group in the two sets of scores—see 


"Table III. 
The column headed D gives the rank differences, and the 
column headed D? gives the squares of these differences. 


X x D ps 
A 3 5 2 4 
B 4 3 1 1 
C 9 9 o o 
D 10 10 o o 
E 5 2 3 9 
F 8 6 2 4 
G 2 I I I 
H I 4 3 9 
I 6 8 2 4 
K 7 7 o (9 


Sum (ED?) = 32 
The rank correlation coefficient pyy is given by the formula 
6xD? 
Pay E =“ шт) 
192 
^ 0 X 99 


= I — 194 = +806 + 
It will be noted that the rank correlation coefficient фуу = 
`806 is less thar: the product-moment correlation coefficient 
Txy = :87. 
Tag? 
ies’ are dealt w 


H 


ith as follows: Suppose in a list of scores 
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after (say) No. 5 the next two make the same score. Then 
each is given the rank of 


б] 
L6. 


Again suppose after (say) No. 2 the next three make the same 
score. Then each is given the rank of 


34+4+5 _ 
$ 7^ 


and so on. 


| 
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Tue table for converting percentage values (percentiles or 
P.R.’s) to c-values is called the ‘Probability Integral Table’. 
Here is a short version of it. The c-values are correct to the 
Second decimal place. 


Percentage Values c-values 
99:9 223799 

99:5 +2:58 

99:0 T233 

95 FE 

9o +1-28 

85 Tro4 

8o +084 

75 +067 

70 +052 

65 +0°39 

60 +025 

66 +013 

50 0:00 

45 TOS 

40 —0'25 

35 — 039 

30 —0:52 

25 —0:67 

20 —084 

15 —1:04 

^ Io * —1:28 
5 — 1:645 

го s —233 

o5 —2:58 

oi -:99 

‚ . 99 * А 
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Experimental group, 3° 


Factor axes, 54; rotation of, 54 
Factors, mental, 50, 87; common, 
51; bi-polar, 55; general, 51, 
57; group, 51, 575 specific, 51, 
57; trait, 59; type, 59 
Freedom, 2 


General ability, 57 
General emotionality, 58 
General factors, 51, 57 
Group factors, 51, 57 
Guidance, vocational, 85 


Height, measurement of, 14 


Heredity, 13, 64 
Hierarchical order, 56 


Intelligence, 57, 64; tests, 4, 7» 
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Interquartile range, 21, 36 
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Loadings, factor, 52 


Marks, 15, 23 


Mean, 14; deviation, 16, 18 
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Measurement, 1; criticism, of 
2; mental, 4; spatial, 6; of 
height, 14 

Measure, Standard, 20, 43 

Median, 21, 36 

Mental age, 16, 26 

Mental factors, 50 

Merit, order of, 35 

Mind, 2 

Motives, persistence of, 58 


Normal distribution, 14 


Objective tests, 3, 7, 15 

Order, of merit, 35; hierarchical, 
56 

Origin of measurement, 12 

Orthogonal set of factors, 53 


Percentile, 20; rank, 20 
Perseveration, 58 

Persistence of motives, 58 
Personal equation, 28 

Persons, correlation between, 59 
Practice, effect on tests, 67 
Probable, error, 307.5 score, 44 


Quartile, 21 

Questionnaire, то, 36 

Quotient, educational, 26, 73; 
intelligence, 26, 63, 73 


Rank, percentile, 20 
Rank correlation, 48 
Rating scales, 31 
Ratio, correlation, 45 
Raw scores, 15, 19; 
45 

Records, school, 7o 
Regression, 45 
Reliability, coefficient of, 46 
Research, educational, 75 


correlations, 


Rotation, of factor-axes, 54 


Scale, 9, 10, 12; rating, 31 

School records, 7o 

Scores, raw, 15, 19; probable, 44 

Secondary Education, allocation 
to, 77 

Significance, 28, 76 

Spatial measurement, 6 

Spearman, Charles, 48, 56 

Special aptitudes, 57 

Specific factors, 51, 57 

Standard deviation, 17 

Standard Error, 28 

Standard Measure, 20, 43; score, 
22, 74 

Standardization of tests, 24 

Standards of work, 71 

Surgency, 6o 


Teachers’ estimates, 78 
Tests, intelligence, 4, 7; 15, 57; 


79; objective, 3, 7, 15; of 


composition, 9 
Test-vector, 53 
‘Thomson, Sir Godfrey, 57 
Trait-factors, 59 
‘Two-Factor Theory, 57 
Type-factors, 59 


Units of measurement, I2, 14; 
equivalence of, 19 


Validity, coefficient of, 46 
Variance, 18, 39 
Vector, test, 53 
Vocational Guidance, 85 


Weighting, Scores, etc., 20 
Weights, factor, 52 


Zero point, 12 
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